2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)最新文献

英文中文

A fast integral image generation algorithm on GPUs 基于gpu的快速积分图像生成算法

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097862

Qingqing Dang, Shengen Yan, Ren Wu

Integral image, also known as summed area table is a two-dimensional table generated from an input image. Each entry in the table stores the sum of all pixels which locate on the top-left corner of the entry in the input image. Integral image is a very popular and important algorithm in computer vision and computer graphics applications. Especially in real-time computer vision, it is usually used to accelerate calculating the sum of a rectangular area. Integral image algorithm is memory-bounded. There are two typical existed image integral algorithms on GPUs. The first is the Scan-Scan algorithm. The second is the Scan-Transpose-Scan algorithm, which adopts three steps to generate the integral image. The first and the third steps are scan. In order to achieve coalesced global memory access in the third step, a transpose step is added. In this paper, we propose a novel blocked integral algorithm, which has three stages. The first stage is intra-block reduction. The second stage is auxiliary matrix scan and the third stage is intra-block scan. Compared with the Scan-Scan algorithm, our proposed scheme reduces the global memory accesses. At the same time, less local synchronizations and less load imbalance are achieved. Compared with the Scan-Transpose-Scan algorithm, our proposed algorithm only needs about half of the global memory accesses. At the same time, coalesced memory access is achieved. We implemented these three algorithms with OpenCL so that they can run on both Nvidia and AMD GPUs. We also designed an auto-tuning framework to search optimal parameters for different size of input matrix on those two platforms. The experiment result shows that our proposed algorithm gets the best performance compared with the two existed typical integral algorithms.

积分图像，也称为求和面积表，是由输入图像生成的二维表格。表中的每个条目存储输入图像中位于条目左上角的所有像素的总和。积分图像算法是计算机视觉和计算机图形学应用中非常流行的一种重要算法。特别是在实时计算机视觉中，通常用于加速计算矩形面积的和。积分图像算法是有内存限制的。目前在图形处理器上有两种典型的图像积分算法。第一个是扫描-扫描算法。二是扫描-转置-扫描算法，该算法采用三步生成积分图像。第一步和第三步是扫描。为了在第三步中实现合并的全局内存访问，增加了一个转置步骤。本文提出了一种新的块积分算法，该算法分为三个阶段。第一阶段是块内缩减。第二阶段为辅助矩阵扫描，第三阶段为块内扫描。与Scan-Scan算法相比，我们提出的方案减少了全局内存访问。同时，实现了更少的本地同步和更少的负载不平衡。与扫描-转置-扫描算法相比，我们提出的算法只需要大约一半的全局内存访问。同时，实现了合并内存访问。我们用OpenCL实现了这三种算法，这样它们就可以在Nvidia和AMD的gpu上运行。我们还设计了一个自动调整框架，在这两个平台上搜索不同大小的输入矩阵的最优参数。实验结果表明，与现有的两种典型的积分算法相比，本文提出的算法具有最好的性能。

{"title":"A fast integral image generation algorithm on GPUs","authors":"Qingqing Dang, Shengen Yan, Ren Wu","doi":"10.1109/PADSW.2014.7097862","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097862","url":null,"abstract":"Integral image, also known as summed area table is a two-dimensional table generated from an input image. Each entry in the table stores the sum of all pixels which locate on the top-left corner of the entry in the input image. Integral image is a very popular and important algorithm in computer vision and computer graphics applications. Especially in real-time computer vision, it is usually used to accelerate calculating the sum of a rectangular area. Integral image algorithm is memory-bounded. There are two typical existed image integral algorithms on GPUs. The first is the Scan-Scan algorithm. The second is the Scan-Transpose-Scan algorithm, which adopts three steps to generate the integral image. The first and the third steps are scan. In order to achieve coalesced global memory access in the third step, a transpose step is added. In this paper, we propose a novel blocked integral algorithm, which has three stages. The first stage is intra-block reduction. The second stage is auxiliary matrix scan and the third stage is intra-block scan. Compared with the Scan-Scan algorithm, our proposed scheme reduces the global memory accesses. At the same time, less local synchronizations and less load imbalance are achieved. Compared with the Scan-Transpose-Scan algorithm, our proposed algorithm only needs about half of the global memory accesses. At the same time, coalesced memory access is achieved. We implemented these three algorithms with OpenCL so that they can run on both Nvidia and AMD GPUs. We also designed an auto-tuning framework to search optimal parameters for different size of input matrix on those two platforms. The experiment result shows that our proposed algorithm gets the best performance compared with the two existed typical integral algorithms.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115450910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Towards social botnet behavior detecting in the end host 面向终端主机的社交僵尸网络行为检测

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097824

Yuede Ji, Yukun He, Xinyang Jiang, Qiang Li

Social botnet utilizing online social network (OSN) as Command and Control channel (C&C) has caused enormous threats to Internet security. Server-side detection approaches mainly target on suspicious accounts, which cannot identify the specific bot hosts or processes. Host-side approaches target on suspicious process behaviors which are not robust enough to face the challenges of frequent variants and novel social bots. In this paper, we propose a novel social bot behavior detecting approach in the end host. Because social bot binaries or source codes are not easy to collect, we first design a novel social botnet, named wbbot, based on Sina Weibo. We analyze it from two aspects, wbbot architecture and wbbot behaviors. Second, we analyze the host behaviors of existing social botnets which come from public websites, other researchers, and our implementations. We identify six critical phases: infection, pre-defined host behaviors, establishment of C&C, receive the commands of botmaster, execution of social bot commands, and return the results. Third, we present our detection system which consists of three components: host behavior monitor, host behavior analyzer, and detection approach. We present behavior tree-based approach to detect social bot. After constructing the suspicious behavior tree, we match it with the template library to generate detection result. Finally, we collect real-world social botnet traces to evaluate the performance. We would like to share them for academic research. The results indicate that our system has an acceptable false positive rate of 29.6% and remarkable false negative rate of 4.5%. However, compared with other detection tools, our detection result is still remarkable.

利用在线社交网络(OSN)作为命令控制通道的社交僵尸网络对网络安全造成了巨大威胁。服务器端检测方法主要针对可疑帐户，无法识别特定的bot主机或进程。主机端方法针对可疑的过程行为，这些行为不够健壮，无法面对频繁变体和新型社交机器人的挑战。本文提出了一种基于终端主机的社交机器人行为检测方法。由于社交僵尸网络的二进制代码或源代码不容易收集，我们首先基于新浪微博设计了一个新的社交僵尸网络，命名为whbbot。我们从whbbot架构和whbbot行为两个方面对其进行分析。其次，我们分析了来自公共网站、其他研究人员和我们实现的现有社交僵尸网络的主机行为。我们确定了六个关键阶段:感染，预定义主机行为，建立C&C，接收botmaster命令，执行社交bot命令，并返回结果。第三，我们介绍了我们的检测系统，该系统由三个部分组成:主机行为监视器、主机行为分析器和检测方法。提出了一种基于行为树的社交机器人检测方法。构建可疑行为树后，与模板库进行匹配，生成检测结果。最后，我们收集真实世界的社交僵尸网络痕迹来评估性能。我们希望将它们分享给学术研究。结果表明，该系统的可接受假阳性率为29.6%，显著假阴性率为4.5%。但是，与其他检测工具相比，我们的检测结果仍然是显著的。

{"title":"Towards social botnet behavior detecting in the end host","authors":"Yuede Ji, Yukun He, Xinyang Jiang, Qiang Li","doi":"10.1109/PADSW.2014.7097824","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097824","url":null,"abstract":"Social botnet utilizing online social network (OSN) as Command and Control channel (C&C) has caused enormous threats to Internet security. Server-side detection approaches mainly target on suspicious accounts, which cannot identify the specific bot hosts or processes. Host-side approaches target on suspicious process behaviors which are not robust enough to face the challenges of frequent variants and novel social bots. In this paper, we propose a novel social bot behavior detecting approach in the end host. Because social bot binaries or source codes are not easy to collect, we first design a novel social botnet, named wbbot, based on Sina Weibo. We analyze it from two aspects, wbbot architecture and wbbot behaviors. Second, we analyze the host behaviors of existing social botnets which come from public websites, other researchers, and our implementations. We identify six critical phases: infection, pre-defined host behaviors, establishment of C&C, receive the commands of botmaster, execution of social bot commands, and return the results. Third, we present our detection system which consists of three components: host behavior monitor, host behavior analyzer, and detection approach. We present behavior tree-based approach to detect social bot. After constructing the suspicious behavior tree, we match it with the template library to generate detection result. Finally, we collect real-world social botnet traces to evaluate the performance. We would like to share them for academic research. The results indicate that our system has an acceptable false positive rate of 29.6% and remarkable false negative rate of 4.5%. However, compared with other detection tools, our detection result is still remarkable.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124680786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Continuous similarity join on data streams 数据流上的连续相似连接

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097853

Jia Cui, Weiping Wang, Dan Meng, Zhenyan Liu

Similarity join plays an important role in many applications, such as data cleaning and integration, to address the poor data quality problem. Most of the existing studies focused on performing similarity join on static datasets but few studies realized running it on dynamic data streams. With the development of network technology, the data accessing paradigm has transferred from disk-oriented mode to online data streams, which makes performing similarity join in continuous query on data streams become a novel query processing paradigm. Different from static dataset, data stream is unbounded, continuous and unpredictable. The significant differences pose serious challenges, such as real-time query performance. To this end, we study the problem of continuous similarity join on data streams in this paper, which is based on edit distance metric and filter-and-verify framework with sliding-window semantics. Two subcases of this problem are studied, including self similarity join on a single data stream and similarity join on two streams. We introduced the basic window based sliding window model to facilitate the update of sliding window and its index. More details of our method, including signature extraction schemes, filtering and verification algorithms, re-evaluation strategies are discussed respectively. Finally, extensive experimental results show that our method works efficiently on real data streams.

相似连接在许多应用程序中扮演着重要的角色，例如数据清理和集成，以解决数据质量差的问题。现有的研究大多集中在静态数据集上执行相似连接，而很少有研究实现在动态数据流上运行相似连接。随着网络技术的发展，数据访问范式已经从面向磁盘的模式转向在线数据流，这使得对数据流进行连续查询的相似连接成为一种新的查询处理范式。与静态数据集不同，数据流是无界的、连续的、不可预测的。这些显著的差异带来了严重的挑战，比如实时查询性能。为此，本文研究了基于编辑距离度量和带滑动窗口语义的过滤验证框架的数据流连续相似连接问题。研究了该问题的两个子实例，包括单数据流上的自相似连接和两数据流上的相似连接。为了方便滑动窗口及其索引的更新，我们引入了基于基本窗口的滑动窗口模型。详细讨论了签名提取方案、过滤和验证算法、重评估策略等。最后，大量的实验结果表明，我们的方法在实际数据流上是有效的。

{"title":"Continuous similarity join on data streams","authors":"Jia Cui, Weiping Wang, Dan Meng, Zhenyan Liu","doi":"10.1109/PADSW.2014.7097853","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097853","url":null,"abstract":"Similarity join plays an important role in many applications, such as data cleaning and integration, to address the poor data quality problem. Most of the existing studies focused on performing similarity join on static datasets but few studies realized running it on dynamic data streams. With the development of network technology, the data accessing paradigm has transferred from disk-oriented mode to online data streams, which makes performing similarity join in continuous query on data streams become a novel query processing paradigm. Different from static dataset, data stream is unbounded, continuous and unpredictable. The significant differences pose serious challenges, such as real-time query performance. To this end, we study the problem of continuous similarity join on data streams in this paper, which is based on edit distance metric and filter-and-verify framework with sliding-window semantics. Two subcases of this problem are studied, including self similarity join on a single data stream and similarity join on two streams. We introduced the basic window based sliding window model to facilitate the update of sliding window and its index. More details of our method, including signature extraction schemes, filtering and verification algorithms, re-evaluation strategies are discussed respectively. Finally, extensive experimental results show that our method works efficiently on real data streams.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Construct a simply and quickly platform to solving linear systems 构建一个简单、快速的求解线性系统的平台

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097894

Chih-Wei Hsieh, Yu-Fen Cheng, C. Chou

The scientific computing is important research for industrial and society. And, the linear system becomes more important in scientific computing. However, the linear system solvers have many combinations. How to rapidly selecting a best method to solving matrices is expensive. In this paper, we present a linear system solvers platform, which offer easily and quickly interface to users.

科学计算是工业和社会的重要研究课题。在科学计算中，线性系统变得越来越重要。然而，线性系统解算器有许多组合。如何快速选择一个最佳的方法来求解矩阵是昂贵的。本文提出了一个线性系统求解平台，为用户提供了方便快捷的界面。

引用次数: 0

Transmission characteristics of hybrid structure yarns for e-textiles 电子纺织品用混合结构纱线的传动特性

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097909

Minki Choi, Jooyong Kim

Twisted Copper Filaments (TCF) have been made by a yarn covering process in order to transmit signals and powers for electronic textiles. The 560 den. polyurethane filaments were covered in S-twist direction by urethane-coated copper wires. Final filaments were found to be changed in resonance frequency mainly due to the change of di-electricity and thus capacitance caused by PET covered on it. It have been concluded that while resonance frequency was primarily determined by filament length and dielectric constant of covering yarns, S11 and S21 were mainly determined by measurement length and ply number.

为了在电子纺织品中传输信号和电力，采用纱线包覆工艺制备了扭铜丝(TCF)。560书房。用包覆聚氨酯的铜线沿s捻方向包裹聚氨酯长丝。最终灯丝谐振频率的变化主要是由于覆盖在其上的PET引起的介电变化和电容的变化。结果表明，谐振频率主要由包覆纱的长丝长度和介电常数决定，而S11和S21主要由测长和捻数决定。

引用次数: 0

HARP: Towards enhancing data recency for eventually consistent data stores HARP:增强数据近时性，最终实现一致的数据存储

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097870

Yu Tang, Hailong Sun, Xu Wang, Xudong Liu

To attain high performance and remain available during network partitions or node failures, modern distributed systems often sacrifice recency guarantees, which can provide a uniform view on recent versions of data items for different clients. In this work, we consider the problem of increasing the probability of data recency while preserving low response latency and maintaining high availability on top of an eventually consistent data store. To solve the problem, we propose HARP, an approach that can enhance data recency in a highly available way. Based on HARP, we implement an agent layer to detect stale reads and resolve the conflicts, and by leveraging widely deployed data store technologies, we build a data storage system. We compare the prototype system to Cassandra, and experimentally prove that our method produces low overhead (less than 10%) based on the eventually consistent configuration and, for most workloads, achieves better performance than the Cassandra's strong “read your writes” configurations.

为了获得高性能并在网络分区或节点故障期间保持可用性，现代分布式系统经常牺牲近时性保证，这可以为不同的客户机提供关于数据项最新版本的统一视图。在这项工作中，我们考虑了在最终一致的数据存储之上保持低响应延迟和保持高可用性的同时增加数据最近的概率的问题。为了解决这个问题，我们提出了一种可以以高可用性的方式提高数据近时性的方法——HARP。在此基础上，我们实现了一个代理层来检测过期读取并解决冲突，并利用广泛部署的数据存储技术构建了一个数据存储系统。我们将原型系统与Cassandra进行了比较，并通过实验证明，基于最终一致的配置，我们的方法产生了较低的开销(小于10%)，并且对于大多数工作负载，实现了比Cassandra强大的“读你写”配置更好的性能。

引用次数: 1

FENet: An SDN-based scheme for virtual network management FENet:基于sdn的虚拟网络管理方案

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097815

Kun Liu, Tianyu Wo, Lei Cui, Bin Shi, Jie Xu

Virtual networking is vital to efficient resource management in Clouds, and it is in fact one of the main services provided by many Cloud Computing platforms. Virtual network management needs to meet specific requirements, including tenant isolation and adaption to virtual machines' lifecycle. Most of the existing schemes for virtual network management are based on the use of overlay networks in order to achieve a desirable degree of flexibility. However, these schemes suffer from a common limit, i.e. relatively high performance penalty due to a complicated forwarding process. We address this performance concern by developing a new management scheme, FENet, which makes use of Software-Defined Networks (SDN) to create virtual networks and manage them via the SDN controller programs. We present the design of an SDN controller, with the definition of flow entry rules based on the OpenFlow protocol and the specification of a routing algorithm. The results from our experimental evaluation show that our SDN-based prototype can control virtual network interconnections and tenant isolation appropriately. FENet achieves about 30% better network performance than the management scheme based on OpenVPN and lower latency in comparison with the traditional bridging scheme.

虚拟网络对于云中有效的资源管理至关重要，它实际上是许多云计算平台提供的主要服务之一。虚拟网络管理需要满足特定的需求，包括租户隔离和适应虚拟机的生命周期。现有的虚拟网络管理方案大多基于覆盖网络的使用，以达到理想的灵活性。然而，这些方案都有一个共同的限制，即由于复杂的转发过程而导致相对较高的性能损失。我们通过开发一种新的管理方案FENet来解决这一性能问题，该方案利用软件定义网络(SDN)创建虚拟网络并通过SDN控制器程序对其进行管理。我们设计了一个SDN控制器，定义了基于OpenFlow协议的流入口规则和路由算法的规范。实验评估结果表明，基于sdn的原型可以适当地控制虚拟网络互连和租户隔离。FENet的网络性能比基于OpenVPN的管理方案提高30%左右，时延比传统桥接方案低。

{"title":"FENet: An SDN-based scheme for virtual network management","authors":"Kun Liu, Tianyu Wo, Lei Cui, Bin Shi, Jie Xu","doi":"10.1109/PADSW.2014.7097815","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097815","url":null,"abstract":"Virtual networking is vital to efficient resource management in Clouds, and it is in fact one of the main services provided by many Cloud Computing platforms. Virtual network management needs to meet specific requirements, including tenant isolation and adaption to virtual machines' lifecycle. Most of the existing schemes for virtual network management are based on the use of overlay networks in order to achieve a desirable degree of flexibility. However, these schemes suffer from a common limit, i.e. relatively high performance penalty due to a complicated forwarding process. We address this performance concern by developing a new management scheme, FENet, which makes use of Software-Defined Networks (SDN) to create virtual networks and manage them via the SDN controller programs. We present the design of an SDN controller, with the definition of flow entry rules based on the OpenFlow protocol and the specification of a routing algorithm. The results from our experimental evaluation show that our SDN-based prototype can control virtual network interconnections and tenant isolation appropriately. FENet achieves about 30% better network performance than the management scheme based on OpenVPN and lower latency in comparison with the traditional bridging scheme.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116017686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Accelerating the iterative linear solver for reservoir simulation on multicore architectures 加速多核油藏模拟的迭代线性求解

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097817

Wei Wu, Xiang Li, Lei He, Dongxiao Zhang

Modern petroleum reservoir simulation serves as a primary tool for quantitatively managing reservoir production and planning new fields. It involves repeatedly solving the Jacobian of a set of strong nonlinear partial differential equations governing the mass and energy conduction and conservation. Most of the existing reservoir simulators adopt iterative solver with multiple stages of preconditioners, in which the incomplete LU (ILU) factorization is an outstanding universal smoother. However, it turns out that when the degree of freedom of each grid grows, ILU usually becomes the bottleneck of the solver. Moreover, ILU is difficult to parallelize due to its inherent data dependency. In this paper, we developed a sparse iterative solver with parallelized ILU and triangular solve using block-wise data structure. Compared with the state of art iterative solver on 14 industrial reservoir simulation matrices, the proposed ILU is 5.2x faster (on average) than the state of art iterative solver because of the block-wise data structure, which leads to 2.2x speedup on the total solver runtime. In addition, parallel ILU and triangular solve are developed to further accelerate the solver. To tackle the strong data dependency in ILU and triangular solve, we first partition the algorithm into separated tasks and construct a data flow graph to represent the data dependency. Then, tasks are scheduled in parallel according to the topological order of the data flow graph. On an 8-thread multicore architecture, we achieved another 3.6x speedup on ILU factorization, and 3.3x on triangular solve with good scalability.

现代油藏模拟是定量管理油藏生产和规划新油田的主要工具。它涉及到反复求解一组控制质量和能量传导和守恒的强非线性偏微分方程的雅可比矩阵。现有油藏模拟大多采用多阶段预调节器的迭代求解，其中不完全LU (ILU)分解是一种突出的通用平滑算法。然而，当每个网格的自由度增大时，逻辑单元往往成为求解器的瓶颈。此外，ILU由于其固有的数据依赖性而难以并行化。在本文中，我们开发了一种稀疏迭代求解器，它具有并行化的ILU和三角形求解，采用分块数据结构。与目前最先进的14个工业油藏模拟矩阵迭代求解器相比，由于采用了分块数据结构，所提出的ILU比目前最先进的迭代求解器(平均)快5.2倍，从而使总求解器运行时间加快2.2倍。此外，还开发了并行逻辑单元和三角解，进一步加快了求解速度。为了解决ILU和三角求解中的强数据依赖性，我们首先将算法划分为独立的任务，并构造数据流图来表示数据依赖性。然后，根据数据流图的拓扑顺序并行调度任务。在8线程多核架构上，我们在ILU分解上实现了3.6倍的加速，在三角形求解上实现了3.3倍的加速，并具有良好的可扩展性。

{"title":"Accelerating the iterative linear solver for reservoir simulation on multicore architectures","authors":"Wei Wu, Xiang Li, Lei He, Dongxiao Zhang","doi":"10.1109/PADSW.2014.7097817","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097817","url":null,"abstract":"Modern petroleum reservoir simulation serves as a primary tool for quantitatively managing reservoir production and planning new fields. It involves repeatedly solving the Jacobian of a set of strong nonlinear partial differential equations governing the mass and energy conduction and conservation. Most of the existing reservoir simulators adopt iterative solver with multiple stages of preconditioners, in which the incomplete LU (ILU) factorization is an outstanding universal smoother. However, it turns out that when the degree of freedom of each grid grows, ILU usually becomes the bottleneck of the solver. Moreover, ILU is difficult to parallelize due to its inherent data dependency. In this paper, we developed a sparse iterative solver with parallelized ILU and triangular solve using block-wise data structure. Compared with the state of art iterative solver on 14 industrial reservoir simulation matrices, the proposed ILU is 5.2x faster (on average) than the state of art iterative solver because of the block-wise data structure, which leads to 2.2x speedup on the total solver runtime. In addition, parallel ILU and triangular solve are developed to further accelerate the solver. To tackle the strong data dependency in ILU and triangular solve, we first partition the algorithm into separated tasks and construct a data flow graph to represent the data dependency. Then, tasks are scheduled in parallel according to the topological order of the data flow graph. On an 8-thread multicore architecture, we achieved another 3.6x speedup on ILU factorization, and 3.3x on triangular solve with good scalability.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"600 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131966209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Energy-aware multipath routing for data aggregation in wireless sensor networks 无线传感器网络中数据聚合的能量感知多径路由

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097890

Yingyuan Xiao, Xinrong Zhao, Hongya Wang, Ching-Hsien Hsu

Data aggregation in wireless sensor networks is widely used to collect data in an energy efficient manner to eliminate redundant data transmission so that prolong the network lifetime. To meet the data aggregation needs in wireless sensor networks, this paper proposes a novel multi-path routing algorithm, called EAD, to process in-network data aggregation. For each sensor on the routing paths, EAD evaluates its neighbors based on the residual energy, deviation angle and distance, and selects the k neighbors with the minimal evaluation costs as its forwarding nodes in order to balance energy consumption of the wireless sensor network on the premise of ensuring the reliability and performance. Simulation results show that EAD can effectively prolong network lifetime, reduce latency and ensure the reliability by adjusting the weight of each influencing factor.

无线传感器网络中的数据聚合被广泛应用于以高效节能的方式收集数据，以消除冗余数据传输，从而延长网络寿命。为了满足无线传感器网络中数据汇聚的需求，本文提出了一种新的多路径路由算法EAD来处理网络内数据汇聚。对于路由路径上的每个传感器，EAD根据剩余能量、偏差角和距离对其邻居进行评估，并选择评估代价最小的k个邻居作为其转发节点，在保证可靠性和性能的前提下平衡无线传感器网络的能量消耗。仿真结果表明，通过调整各影响因素的权重，EAD可以有效地延长网络生存期，降低时延，保证可靠性。

引用次数: 3

Energy-efficient mobile data collection in energy-harvesting wireless sensor networks 能量采集无线传感器网络中的节能移动数据采集

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097791

Cong Wang, Songtao Guo, Yuanyuan Yang

Environmental energy harvesting technologies have provided potential for battery-powered wireless sensor networks to have perpetual network operations. To design a robust network that can adapt to not only temporal but also spatial variations of ambient energy sources, in this paper, we utilize mobility to circumvent communication bottlenecks, by employing a mobile data collector, called SenCar. We propose a two-stage approach for mobile data collection. In the first stage, SenCar makes stops at a subset of selected sensor locations to collect data packets in a multi-hop fashion. We provide a selection algorithm to search for sensor locations with most residual energy while guaranteeing a bounded tour length. Then we design a distributed data gathering algorithm to achieve maximum network utility by adjusting data rates, link scheduling and flow routing that adapts to spatial temporal environmental energy variations. The effectiveness and efficiency of the proposed algorithms are validated by extensive numerical results.

环境能量收集技术为电池供电的无线传感器网络提供了永久网络运行的潜力。为了设计一个既能适应时间变化又能适应环境能源空间变化的强大网络，在本文中，我们利用移动性来绕过通信瓶颈，采用了一种名为SenCar的移动数据收集器。我们提出了一种两阶段的移动数据收集方法。在第一阶段，SenCar在选定传感器位置的一个子集停下来，以多跳方式收集数据包。我们提供了一种选择算法，在保证有界行程长度的情况下，搜索剩余能量最多的传感器位置。然后，我们设计了一种分布式数据采集算法，通过调整数据速率、链路调度和流量路由来适应时空环境能量的变化，实现网络效用最大化。大量的数值结果验证了所提算法的有效性和高效性。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀