首页 > 最新文献

Proceedings 10th IEEE International Symposium on High Performance Distributed Computing最新文献

英文 中文
Practical resource management for grid-based visual exploration 实用的基于网格的视觉探索资源管理
K. Czajkowski, A. K. Demir, C. Kesselman, M. Thiébaux
Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments.
计算网格使科学家和组织之间的协作能够跨共享、分布式资源生成和归档超大数据集。有必要在项目的整个生命周期中可视化地探索这些数据。对大型数据集的实际探索需要可视化工具,这些工具可以在创建和存储数据的相同网格环境中发挥作用。资源管理接口是网格计算环境的一个重要的结构组件,因为它们支持对科学工作所需的各种资源的统一访问。提出了一种新的图形资源预约系统;并应用现有电网技术创建通用有源储能系统。我们报告了我们在原型基础设施和应用程序组件方面的经验,包括在代表性分布式环境中耦合端到端资源以进行交互式可视化大数据探索的实验。
{"title":"Practical resource management for grid-based visual exploration","authors":"K. Czajkowski, A. K. Demir, C. Kesselman, M. Thiébaux","doi":"10.1109/HPDC.2001.945209","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945209","url":null,"abstract":"Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Efficient techniques for distributed computing 高效的分布式计算技术
Thomas Dramlitsch, Gabrielle Allen, E. Seidel
We discuss a set of novel techniques we are developing, which build on standard tools, to make distributed computing for large-scale simulations across multiple machines (even scattered across different continents) a reality. With these techniques we demonstrate that we are able to scale a tightly coupled scientific application in metacomputing environments. Such research and development in metacomputing will lead the way to routine, straightforward and efficient use of distributed computing resources anywhere around the world. This work applies not only to the large-scale simulations in astrophysics which provide the motivation for this work, but also opens the way for new, innovative application scenarios.
我们讨论了一组我们正在开发的新技术,它们建立在标准工具的基础上,使跨多台机器(甚至分散在不同大陆)的大规模模拟的分布式计算成为现实。通过这些技术,我们证明了我们能够在元计算环境中扩展紧密耦合的科学应用程序。元计算的这种研究和发展将引领世界各地分布式计算资源的常规、直接和有效使用。这项工作不仅适用于天体物理学的大规模模拟,为这项工作提供了动力,而且为新的、创新的应用场景开辟了道路。
{"title":"Efficient techniques for distributed computing","authors":"Thomas Dramlitsch, Gabrielle Allen, E. Seidel","doi":"10.1109/HPDC.2001.945214","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945214","url":null,"abstract":"We discuss a set of novel techniques we are developing, which build on standard tools, to make distributed computing for large-scale simulations across multiple machines (even scattered across different continents) a reality. With these techniques we demonstrate that we are able to scale a tightly coupled scientific application in metacomputing environments. Such research and development in metacomputing will lead the way to routine, straightforward and efficient use of distributed computing resources anywhere around the world. This work applies not only to the large-scale simulations in astrophysics which provide the motivation for this work, but also opens the way for new, innovative application scenarios.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132688157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A case for TCP Vegas in high-performance computational grids 高性能计算网格中TCP Vegas的一个案例
E. Weigle, Wu-chun Feng
Computational grids such as the Information Power Grid (Johnston et al., 1999), Particle Physics Data Grid , and Earth System Grid depend on TCP to provide reliable communication between nodes across a wide-area network (WAN). Of the available TCP implementations, TCP Reno and its variants are the most widely deployed; however, Reno's performance in computational grids is mediocre at best. Due to conflicting results in the evaluation of TCP implementations, we present a detailed simulation study that unifies the conflicting results and demonstrates the limitations of earlier work. We focus on the two most debated versions of TCP-Reno and Vegas. Using real traffic distributions, we show that Vegas performs well over modern high-performance links and better than Reno with the proper selection of the Vegas parameters /spl alpha/ and /spl beta/. Our results exhibit ways to significantly enhance the performance of distributed computational grids that rely on TCP.
诸如信息电网(Johnston et al., 1999)、粒子物理数据网格和地球系统网格等计算网格依靠TCP在广域网(WAN)上的节点之间提供可靠的通信。在可用的TCP实现中,TCP Reno及其变体是部署最广泛的;然而,雷诺在计算网格方面的表现充其量是平庸的。由于评估TCP实现时的结果相互矛盾,我们提出了一个详细的模拟研究,统一了相互矛盾的结果,并展示了早期工作的局限性。我们关注的是两个最具争议的tcp版本——雷诺和维加斯。使用真实的流量分布,我们表明拉斯维加斯在现代高性能链路上表现良好,并且在适当选择拉斯维加斯参数/spl alpha/和/spl beta/时优于里诺。我们的结果展示了显著提高依赖TCP的分布式计算网格性能的方法。
{"title":"A case for TCP Vegas in high-performance computational grids","authors":"E. Weigle, Wu-chun Feng","doi":"10.1109/HPDC.2001.945186","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945186","url":null,"abstract":"Computational grids such as the Information Power Grid (Johnston et al., 1999), Particle Physics Data Grid , and Earth System Grid depend on TCP to provide reliable communication between nodes across a wide-area network (WAN). Of the available TCP implementations, TCP Reno and its variants are the most widely deployed; however, Reno's performance in computational grids is mediocre at best. Due to conflicting results in the evaluation of TCP implementations, we present a detailed simulation study that unifies the conflicting results and demonstrates the limitations of earlier work. We focus on the two most debated versions of TCP-Reno and Vegas. Using real traffic distributions, we show that Vegas performs well over modern high-performance links and better than Reno with the proper selection of the Vegas parameters /spl alpha/ and /spl beta/. Our results exhibit ways to significantly enhance the performance of distributed computational grids that rely on TCP.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114241063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
The Kangaroo approach to data movement on the Grid 在网格上移动数据的袋鼠式方法
D. Thain, J. Basney, Se-Chang Son, M. Livny
Access to remote data is one of the principal challenges of Grid computing. While performing I/O, Grid applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.
访问远程数据是网格计算的主要挑战之一。在执行I/O时,Grid应用程序必须为服务器崩溃、性能变化和资源耗尽做好准备。为了在如此恶劣的环境中实现高吞吐量,应用程序需要一个弹性服务,在移动数据的同时隐藏错误和延迟。我们用Kangaroo来说明这个想法,这是一个简单的数据移动系统,它可以利用磁盘和网络来保持应用程序的运行。我们证明了Kangaroo可以实现比传统数据移动技术更好的端到端性能,即使它的单个组件不能实现高性能。
{"title":"The Kangaroo approach to data movement on the Grid","authors":"D. Thain, J. Basney, Se-Chang Son, M. Livny","doi":"10.1109/HPDC.2001.945200","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945200","url":null,"abstract":"Access to remote data is one of the principal challenges of Grid computing. While performing I/O, Grid applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116338066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Overview of security considerations for computational and data grids 计算网格和数据网格的安全考虑概述
W. Johnston, S. Talwar, K. Jackson
Large-scale science and engineering is frequently done through the interaction of collaborating groups, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for "Grids" is to enable the routine interactions of these resources to enhance this type of large-scale science and engineering, and thus substantially increase the computing and data handling capabilities available to science and engineering projects. However, even if this environment works in every other way, it will not be viable if it is constantly disrupted by hackers and their kin. Distributed applications are potentially more vulnerable than conventional scientific problem solving environments because there are substantially more targets to attack in order to impact a single application. Much of the overall security of Grids is inherited from the security of the underlying systems. There are, however, some security considerations at the Grid level that are independent of the underlying systems, and we focus on this latter aspect.
大规模的科学和工程经常是通过协作小组、异构计算资源、信息系统和仪器的相互作用来完成的,所有这些都是地理上和组织上分散的。“网格”的总体动机是使这些资源的日常交互能够增强这种类型的大规模科学和工程,从而大大增加科学和工程项目可用的计算和数据处理能力。然而,即使这种环境以其他方式工作,如果它不断被黑客和他们的亲戚破坏,它也不会可行。分布式应用程序可能比传统的科学问题解决环境更容易受到攻击,因为要影响单个应用程序,需要攻击的目标要多得多。网格的总体安全性大部分继承自底层系统的安全性。然而,在网格级别有一些独立于底层系统的安全考虑,我们将重点关注后一个方面。
{"title":"Overview of security considerations for computational and data grids","authors":"W. Johnston, S. Talwar, K. Jackson","doi":"10.1109/HPDC.2001.945216","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945216","url":null,"abstract":"Large-scale science and engineering is frequently done through the interaction of collaborating groups, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for \"Grids\" is to enable the routine interactions of these resources to enhance this type of large-scale science and engineering, and thus substantially increase the computing and data handling capabilities available to science and engineering projects. However, even if this environment works in every other way, it will not be viable if it is constantly disrupted by hackers and their kin. Distributed applications are potentially more vulnerable than conventional scientific problem solving environments because there are substantially more targets to attack in order to impact a single application. Much of the overall security of Grids is inherited from the security of the underlying systems. There are, however, some security considerations at the Grid level that are independent of the underlying systems, and we focus on this latter aspect.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Wide-area transposition-driven scheduling 广域换位驱动调度
J. Romein, H. Bal
The distributed searching of state spaces containing cycles is a challenging task and has been studied for several years. Traditional parallel search algorithms either ignore the cyclic nature of the state space and waste much time in duplicated search effort, or they rely on heavy communication to reduce duplicate work, resulting in a large communication overhead. Both methods perform poorly, even when using a fast, local interconnection. A recently-developed task distribution scheme, called transposition-driven scheduling (TDS), performs much better, since it communicates asynchronously and efficiently suppresses duplicate search effort. TDS, however, requires bandwidths of megabytes per second per processor. In this paper, we investigate how cyclic state spaces can be searched efficiently on a meta-computing system containing multiple clusters, connected by high-latency, low-bandwidth wide-area links. This is quite a challenge, because the wide-area links provide neither the bandwidth required for TDS nor the latency required for traditional distributed search algorithms. We propose a scheme that strongly reduces communication between clusters at the expense of some duplicate search effort. Performance measurements for several applications show that the new scheme outperforms traditional schemes by a wide margin.
包含循环的状态空间的分布式搜索是一个具有挑战性的任务,已经研究了几年。传统的并行搜索算法要么忽略了状态空间的循环特性,在重复的搜索工作中浪费了大量时间,要么依赖于大量的通信来减少重复的工作,从而导致了巨大的通信开销。即使在使用快速本地互连时,这两种方法的性能也很差。最近开发的任务分配方案,称为换位驱动调度(TDS),性能要好得多,因为它异步通信并有效地抑制重复搜索工作。然而,TDS需要每个处理器每秒兆字节的带宽。在本文中,我们研究了如何在包含多个集群的元计算系统上有效地搜索循环状态空间,这些集群由高延迟、低带宽的广域链路连接。这是一个相当大的挑战,因为广域链接既不能提供TDS所需的带宽,也不能提供传统分布式搜索算法所需的延迟。我们提出了一种方案,以牺牲一些重复的搜索工作为代价,大大减少了集群之间的通信。对几个应用程序的性能测量表明,新方案的性能大大优于传统方案。
{"title":"Wide-area transposition-driven scheduling","authors":"J. Romein, H. Bal","doi":"10.1109/HPDC.2001.945202","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945202","url":null,"abstract":"The distributed searching of state spaces containing cycles is a challenging task and has been studied for several years. Traditional parallel search algorithms either ignore the cyclic nature of the state space and waste much time in duplicated search effort, or they rely on heavy communication to reduce duplicate work, resulting in a large communication overhead. Both methods perform poorly, even when using a fast, local interconnection. A recently-developed task distribution scheme, called transposition-driven scheduling (TDS), performs much better, since it communicates asynchronously and efficiently suppresses duplicate search effort. TDS, however, requires bandwidths of megabytes per second per processor. In this paper, we investigate how cyclic state spaces can be searched efficiently on a meta-computing system containing multiple clusters, connected by high-latency, low-bandwidth wide-area links. This is quite a challenge, because the wide-area links provide neither the bandwidth required for TDS nor the latency required for traditional distributed search algorithms. We propose a scheme that strongly reduces communication between clusters at the expense of some duplicate search effort. Performance measurements for several applications show that the new scheme outperforms traditional schemes by a wide margin.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-resolution resource behavior queries using wavelets 使用小波的多分辨率资源行为查询
J. Skicewicz, P. Dinda, J. Schopf
Different adaptive applications are interested in the dynamic behavior of a resource over different fine- to coarse-grain time-scales. The resource's sensor runs at some fine-grain resource-appropriate sampling rate, producing a discrete-time resource signal. It can be very inefficient to to answer a coarse-grain application query by directly using the fine-grain resource signal. We address this gap between the sensor and its different client applications with a novel query model that explicitly incorporates time-scale as a parameter. The query model is implemented on top of an inherently multi-scale wavelet-based representation of the signal (which could be communicated over a set of multicast channels). A query uses only the wavelet coefficients necessary for its time-scale (and thus could listen to a subset of the channels), greatly reducing the data that need to be communicated. We present very promising initial results on host load signals, showing the tradeoff between compactness and query error. Finally, we describe some of the other operations that the wavelet representation enables.
不同的自适应应用对资源在不同细粒度到粗粒度时间尺度上的动态行为感兴趣。该资源的传感器以与资源适当的细粒度采样率运行,产生离散时间的资源信号。通过直接使用细粒度资源信号来回答粗粒度应用程序查询可能非常低效。我们用一种新颖的查询模型解决了传感器与其不同客户端应用程序之间的差距,该模型显式地将时间尺度作为参数。查询模型是在固有的基于多尺度小波的信号表示(可以通过一组多播信道进行通信)之上实现的。查询只使用其时间尺度所需的小波系数(因此可以侦听信道的子集),从而大大减少了需要通信的数据。我们在主机负载信号上给出了非常有希望的初步结果,显示了紧凑性和查询错误之间的权衡。最后,我们描述了小波表示支持的其他一些操作。
{"title":"Multi-resolution resource behavior queries using wavelets","authors":"J. Skicewicz, P. Dinda, J. Schopf","doi":"10.1109/HPDC.2001.945207","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945207","url":null,"abstract":"Different adaptive applications are interested in the dynamic behavior of a resource over different fine- to coarse-grain time-scales. The resource's sensor runs at some fine-grain resource-appropriate sampling rate, producing a discrete-time resource signal. It can be very inefficient to to answer a coarse-grain application query by directly using the fine-grain resource signal. We address this gap between the sensor and its different client applications with a novel query model that explicitly incorporates time-scale as a parameter. The query model is implemented on top of an inherently multi-scale wavelet-based representation of the signal (which could be communicated over a set of multicast channels). A query uses only the wavelet coefficients necessary for its time-scale (and thus could listen to a subset of the channels), greatly reducing the data that need to be communicated. We present very promising initial results on host load signals, showing the tradeoff between compactness and query error. Finally, we describe some of the other operations that the wavelet representation enables.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125953915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A metadebugger prototype for the HARNESS metacomputing framework 一个用于HARNESS元计算框架的元调试器原型
R. Lovas, V. Sunderam
In order to solve the emerging debugging issues in the field of metacomputing we defined the fundamental principles of an adaptive and integrated debugging and visualization tool: a novel metadebugger. The current prototype has been implemented in the Harness metacomputing framework.
为了解决元计算领域中出现的调试问题,我们定义了一种自适应集成调试和可视化工具的基本原理:一种新的元调试器。目前的原型已经在Harness元计算框架中实现。
{"title":"A metadebugger prototype for the HARNESS metacomputing framework","authors":"R. Lovas, V. Sunderam","doi":"10.1109/HPDC.2001.945210","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945210","url":null,"abstract":"In order to solve the emerging debugging issues in the field of metacomputing we defined the fundamental principles of an adaptive and integrated debugging and visualization tool: a novel metadebugger. The current prototype has been implemented in the Harness metacomputing framework.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126881462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Massively parallel distributed feature extraction in textual data mining using HDDI/sup TM/ 基于HDDI/sup TM/的文本数据挖掘中的大规模并行分布式特征提取
Jirada Kuntraruk, W. Pottenger
One of the primary tasks in mining distributed textual data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with HDDI/sup TM/ (Hierarchical Distributed Dynamic Indexing). We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.
分布式文本数据挖掘的主要任务之一是特征提取。信息的广泛数字化产生了大量的数据,需要在分布式环境中采用新的特征提取方法。我们提出了一个大规模并行模型,用于特征提取,该模型在高度分布式环境中利用pc /工作站网络上的未使用周期。在HDDI/sup TM/(分层分布式动态索引)文本数据挖掘研究中所开发的特征提取算法的基础上,建立了该环境下特征提取过程的时间和通信复杂性分析模型。我们表明,对于涉及基于新颖的并行流水线执行模型的简化操作的应用程序,处理器数量的线性加速是可以实现的。我们正在用基于从万维网上大量页面提取特征的经验观察来验证我们的分析模型。
{"title":"Massively parallel distributed feature extraction in textual data mining using HDDI/sup TM/","authors":"Jirada Kuntraruk, W. Pottenger","doi":"10.1109/HPDC.2001.945204","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945204","url":null,"abstract":"One of the primary tasks in mining distributed textual data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with HDDI/sup TM/ (Hierarchical Distributed Dynamic Indexing). We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133418099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dynamic replica management in the service grid 服务网格中的动态副本管理
Byoung-Dai Lee, J. Weissman
As the Internet is evolving away from providing simple connectivity towards providing more sophisticated services, it is difficult to provide efficient delivery of high-demand services to end users, due to the dynamic sharing of the network and connected servers. To address this problem, we propose the service grid architecture that incorporates dynamic replication and deletion of services.
随着互联网从提供简单的连接向提供更复杂的服务发展,由于网络和连接的服务器的动态共享,很难向最终用户提供高效的高需求服务。为了解决这个问题,我们提出了包含服务动态复制和删除的服务网格体系结构。
{"title":"Dynamic replica management in the service grid","authors":"Byoung-Dai Lee, J. Weissman","doi":"10.1109/HPDC.2001.945213","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945213","url":null,"abstract":"As the Internet is evolving away from providing simple connectivity towards providing more sophisticated services, it is difficult to provide efficient delivery of high-demand services to end users, due to the dynamic sharing of the network and connected servers. To address this problem, we propose the service grid architecture that incorporates dynamic replication and deletion of services.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114519986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
期刊
Proceedings 10th IEEE International Symposium on High Performance Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1