SPINE:安全的可编程集成网络环境

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications Pub Date : 1998-09-07 DOI:10.1145/319195.319197

M. Fiuczynski, R. Martin, Tsutomu Owa, B. Bershad

{"title":"SPINE:安全的可编程集成网络环境","authors":"M. Fiuczynski, R. Martin, Tsutomu Owa, B. Bershad","doi":"10.1145/319195.319197","DOIUrl":null,"url":null,"abstract":"The emergence of fast, cheap embedded processors present s the opportunity to execute code directly on the network interface. We are developing an extensible execution environment, called SPINE, that enables applications to compute directly on the network interface This structure allows network-oriented applications to communicate with other applications executing on the host CPU, peer devices, and remote nodes with low latency and high efficiency. 1 I n t r o d u c t i o n Many I/O intensive applications such as multimedia client, file servers, host based IP routers often move large amounts of data between devices, and therefore place high I/O demands on both the host operating system and the underlying I/O subsystem. Although technology trends point to continued increases in link bandwidth, processor speed, and disk capacity the lagging performance improvements and scalability of I/O busses is increasingly becoming apparent for I/O intensive applications. This performance gap exists because recent improvements in workstation performance have not been balanced by similar improvements in I/O performance. The exponential growth of processor speed relative to the rest of the I/O system, though, presents the opportunity for application-specific processing to occur directly on intelligent I/O devices. Several network interface cards, such as the Myricom's LANai, Alteon's ACEnic, and I20 systems, provide the infrastructure to compute on the device itself. With the technology trend of cheap, fast embedded processors (e.g., StrongARM, PowerPC, MIPS) used by intelligent network interface cards, the challenge is not so much in the hardware design as in a redesign of the software architecture needed to match the capabilities of the raw hardware. We are working to move application-specific functionality directly onto the network interface, and thereby reduce I/O related data and control transfers to the host system to improve overall system performance. The resulting ensemble of host CPUs and device processors forms a potentially large distributed system. In the context of our work, we are exploring how to program such a system at two levels. At one level, we are investigating how to migrate application functionality onto the network interface. Our approach is empirical: we take a monolithic application and migrate its I/O specific functionality into a number of device extensions. An extension is code that is logically part of the application, but runs directly on the network interface. At the next level, we are defining the operating systems interfaces that enable applications to compute directly on an intelligent network interface. Our operating system services rely on two technologies. First, applications and extensions communicate via a message-passing model based on Active Messages [5]. Second, the extensions run in a safe execution environment, called SPINE, that is derived from the SPIN operating system [1]. Applications that will benefit from this software architecture range from those that perform streaming I/O (e.g., multimedia clients/servers and file-servers), host based IP routers [10], cluster based storage management (e.g., Petal [3]), to support for packet filtering (e.g., Lazy Receive Processing [2]). SPINE offers developers a software architecture for the following three features that are key to efficiently implement I/O intensive applications: • Device-to-device transfers. By avoiding extra copies of data, we can significantly reduce bandwidth requirements in and out of host memory as well as halving bandwidth over a shared bus, such as PCI. Additionally, intelligent devices can avoid unnecessary control transfers to the host system as they can process the data before transferring it to a peer device. Techniques, such as SPLICE [16], have been introduced to emulate the device-to-device transfer. ° Host/Device protocol partitioning. Low-level protocol support for application-specific multicast [6], packet filtering (e.g., DPF [12]) and quality of service (e.g., Lazy Receive Processing [2]) has shown to significantly improve system performance. • Device-level memory management. An important performance aspect of a network system is the ability to transfer directly between the network interface and the application buffers. This type of support has been investigated by various projects (e.g., UTLB [11], AMII [13], and UNET/MM [14]). The rest of this paper is organized as follows. In Section 2 we describe the technology trends that argue for the design of smarter I/O devices. In Section 3 we describe the software architecture of SPINE. In Section 4 we describe some example applications that we've built using SPINE in the context of Windows NT. In Section 5 we discuss issues in splitting applications between the host","PeriodicalId":335784,"journal":{"name":"Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":"{\"title\":\"SPINE: a safe programmable and integrated network environment\",\"authors\":\"M. Fiuczynski, R. Martin, Tsutomu Owa, B. Bershad\",\"doi\":\"10.1145/319195.319197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of fast, cheap embedded processors present s the opportunity to execute code directly on the network interface. We are developing an extensible execution environment, called SPINE, that enables applications to compute directly on the network interface This structure allows network-oriented applications to communicate with other applications executing on the host CPU, peer devices, and remote nodes with low latency and high efficiency. 1 I n t r o d u c t i o n Many I/O intensive applications such as multimedia client, file servers, host based IP routers often move large amounts of data between devices, and therefore place high I/O demands on both the host operating system and the underlying I/O subsystem. Although technology trends point to continued increases in link bandwidth, processor speed, and disk capacity the lagging performance improvements and scalability of I/O busses is increasingly becoming apparent for I/O intensive applications. This performance gap exists because recent improvements in workstation performance have not been balanced by similar improvements in I/O performance. The exponential growth of processor speed relative to the rest of the I/O system, though, presents the opportunity for application-specific processing to occur directly on intelligent I/O devices. Several network interface cards, such as the Myricom's LANai, Alteon's ACEnic, and I20 systems, provide the infrastructure to compute on the device itself. With the technology trend of cheap, fast embedded processors (e.g., StrongARM, PowerPC, MIPS) used by intelligent network interface cards, the challenge is not so much in the hardware design as in a redesign of the software architecture needed to match the capabilities of the raw hardware. We are working to move application-specific functionality directly onto the network interface, and thereby reduce I/O related data and control transfers to the host system to improve overall system performance. The resulting ensemble of host CPUs and device processors forms a potentially large distributed system. In the context of our work, we are exploring how to program such a system at two levels. At one level, we are investigating how to migrate application functionality onto the network interface. Our approach is empirical: we take a monolithic application and migrate its I/O specific functionality into a number of device extensions. An extension is code that is logically part of the application, but runs directly on the network interface. At the next level, we are defining the operating systems interfaces that enable applications to compute directly on an intelligent network interface. Our operating system services rely on two technologies. First, applications and extensions communicate via a message-passing model based on Active Messages [5]. Second, the extensions run in a safe execution environment, called SPINE, that is derived from the SPIN operating system [1]. Applications that will benefit from this software architecture range from those that perform streaming I/O (e.g., multimedia clients/servers and file-servers), host based IP routers [10], cluster based storage management (e.g., Petal [3]), to support for packet filtering (e.g., Lazy Receive Processing [2]). SPINE offers developers a software architecture for the following three features that are key to efficiently implement I/O intensive applications: • Device-to-device transfers. By avoiding extra copies of data, we can significantly reduce bandwidth requirements in and out of host memory as well as halving bandwidth over a shared bus, such as PCI. Additionally, intelligent devices can avoid unnecessary control transfers to the host system as they can process the data before transferring it to a peer device. Techniques, such as SPLICE [16], have been introduced to emulate the device-to-device transfer. ° Host/Device protocol partitioning. Low-level protocol support for application-specific multicast [6], packet filtering (e.g., DPF [12]) and quality of service (e.g., Lazy Receive Processing [2]) has shown to significantly improve system performance. • Device-level memory management. An important performance aspect of a network system is the ability to transfer directly between the network interface and the application buffers. This type of support has been investigated by various projects (e.g., UTLB [11], AMII [13], and UNET/MM [14]). The rest of this paper is organized as follows. In Section 2 we describe the technology trends that argue for the design of smarter I/O devices. In Section 3 we describe the software architecture of SPINE. In Section 4 we describe some example applications that we've built using SPINE in the context of Windows NT. In Section 5 we discuss issues in splitting applications between the host\",\"PeriodicalId\":335784,\"journal\":{\"name\":\"Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"59\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/319195.319197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/319195.319197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

摘要

快速、廉价的嵌入式处理器的出现为直接在网络接口上执行代码提供了机会。我们正在开发一种可扩展的执行环境，称为SPINE，它使应用程序能够直接在网络接口上进行计算。这种结构允许面向网络的应用程序以低延迟和高效率与在主机CPU、对等设备和远程节点上执行的其他应用程序进行通信。许多I/ o密集型应用程序(如多媒体客户机、文件服务器、基于主机的IP路由器)经常在设备之间移动大量数据，因此对主机操作系统和底层I/ o子系统都提出了很高的I/ o要求。尽管技术趋势指向链路带宽、处理器速度和磁盘容量的持续增长，但对于I/O密集型应用程序来说，I/O总线的滞后性能改进和可伸缩性越来越明显。这种性能差距的存在是因为最近工作站性能的改进并没有被I/O性能的类似改进所抵消。然而，相对于I/O系统的其余部分，处理器速度的指数级增长为直接在智能I/O设备上进行特定于应用程序的处理提供了机会。一些网络接口卡，如Myricom的LANai、Alteon的ACEnic和I20系统，提供了在设备上进行计算的基础设施。随着智能网络接口卡使用廉价、快速的嵌入式处理器(例如StrongARM、PowerPC、MIPS)的技术趋势，挑战并不在于硬件设计，而在于重新设计软件架构以匹配原始硬件的功能。我们正在努力将特定于应用程序的功能直接转移到网络接口上，从而减少与I/O相关的数据和对主机系统的控制传输，从而提高整体系统性能。由此产生的主机cpu和设备处理器的集合形成了一个潜在的大型分布式系统。在我们工作的背景下，我们正在探索如何在两个层次上对这样一个系统进行编程。在一个层面上，我们正在研究如何将应用程序功能迁移到网络接口上。我们的方法是经验性的:我们采用单片应用程序并将其特定于I/O的功能迁移到许多设备扩展中。扩展是逻辑上属于应用程序一部分的代码，但直接在网络接口上运行。在下一层，我们定义操作系统接口，使应用程序能够直接在智能网络接口上进行计算。我们的操作系统服务依赖于两种技术。首先，应用程序和扩展通过基于活动消息的消息传递模型进行通信[5]。其次，扩展在一个称为SPINE的安全执行环境中运行，该环境派生自SPIN操作系统[1]。受益于这种软件架构的应用程序包括执行流I/O(例如，多媒体客户端/服务器和文件服务器)，基于主机的IP路由器[10]，基于集群的存储管理(例如，petals[3])，以及支持包过滤(例如，延迟接收处理[2])。SPINE为开发人员提供了以下三个功能的软件架构，这些功能是有效实现I/O密集型应用程序的关键:通过避免额外的数据副本，我们可以显著减少主机内存内外的带宽需求，并将共享总线(如PCI)上的带宽减半。此外，智能设备可以避免不必要的控制转移到主机系统，因为它们可以在将数据传输到对端设备之前处理数据。已经引入了SPLICE[16]等技术来模拟设备到设备的传输。°主机/设备协议分区。对特定应用的组播[6]、包过滤(如DPF[12])和服务质量(如延迟接收处理[2])的底层协议支持已被证明可以显著提高系统性能。•设备级内存管理。网络系统的一个重要性能方面是在网络接口和应用程序缓冲区之间直接传输的能力。各种项目(例如，UTLB [11]， AMII[13]和UNET/MM[14])已经对这种类型的支持进行了研究。本文的其余部分组织如下。在第2节中，我们描述了支持设计更智能I/O设备的技术趋势。在第3节中，我们描述了SPINE的软件体系结构。在第4节中，我们描述了在Windows NT环境中使用SPINE构建的一些示例应用程序。在第5节中，我们讨论了在主机之间拆分应用程序的问题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SPINE: a safe programmable and integrated network environment

The emergence of fast, cheap embedded processors present s the opportunity to execute code directly on the network interface. We are developing an extensible execution environment, called SPINE, that enables applications to compute directly on the network interface This structure allows network-oriented applications to communicate with other applications executing on the host CPU, peer devices, and remote nodes with low latency and high efficiency. 1 I n t r o d u c t i o n Many I/O intensive applications such as multimedia client, file servers, host based IP routers often move large amounts of data between devices, and therefore place high I/O demands on both the host operating system and the underlying I/O subsystem. Although technology trends point to continued increases in link bandwidth, processor speed, and disk capacity the lagging performance improvements and scalability of I/O busses is increasingly becoming apparent for I/O intensive applications. This performance gap exists because recent improvements in workstation performance have not been balanced by similar improvements in I/O performance. The exponential growth of processor speed relative to the rest of the I/O system, though, presents the opportunity for application-specific processing to occur directly on intelligent I/O devices. Several network interface cards, such as the Myricom's LANai, Alteon's ACEnic, and I20 systems, provide the infrastructure to compute on the device itself. With the technology trend of cheap, fast embedded processors (e.g., StrongARM, PowerPC, MIPS) used by intelligent network interface cards, the challenge is not so much in the hardware design as in a redesign of the software architecture needed to match the capabilities of the raw hardware. We are working to move application-specific functionality directly onto the network interface, and thereby reduce I/O related data and control transfers to the host system to improve overall system performance. The resulting ensemble of host CPUs and device processors forms a potentially large distributed system. In the context of our work, we are exploring how to program such a system at two levels. At one level, we are investigating how to migrate application functionality onto the network interface. Our approach is empirical: we take a monolithic application and migrate its I/O specific functionality into a number of device extensions. An extension is code that is logically part of the application, but runs directly on the network interface. At the next level, we are defining the operating systems interfaces that enable applications to compute directly on an intelligent network interface. Our operating system services rely on two technologies. First, applications and extensions communicate via a message-passing model based on Active Messages [5]. Second, the extensions run in a safe execution environment, called SPINE, that is derived from the SPIN operating system [1]. Applications that will benefit from this software architecture range from those that perform streaming I/O (e.g., multimedia clients/servers and file-servers), host based IP routers [10], cluster based storage management (e.g., Petal [3]), to support for packet filtering (e.g., Lazy Receive Processing [2]). SPINE offers developers a software architecture for the following three features that are key to efficiently implement I/O intensive applications: • Device-to-device transfers. By avoiding extra copies of data, we can significantly reduce bandwidth requirements in and out of host memory as well as halving bandwidth over a shared bus, such as PCI. Additionally, intelligent devices can avoid unnecessary control transfers to the host system as they can process the data before transferring it to a peer device. Techniques, such as SPLICE [16], have been introduced to emulate the device-to-device transfer. ° Host/Device protocol partitioning. Low-level protocol support for application-specific multicast [6], packet filtering (e.g., DPF [12]) and quality of service (e.g., Lazy Receive Processing [2]) has shown to significantly improve system performance. • Device-level memory management. An important performance aspect of a network system is the ability to transfer directly between the network interface and the application buffers. This type of support has been investigated by various projects (e.g., UTLB [11], AMII [13], and UNET/MM [14]). The rest of this paper is organized as follows. In Section 2 we describe the technology trends that argue for the design of smarter I/O devices. In Section 3 we describe the software architecture of SPINE. In Section 4 we describe some example applications that we've built using SPINE in the context of Windows NT. In Section 5 we discuss issues in splitting applications between the host

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications

自引率

0.00%

发文量

期刊最新文献

The Porcupine scalable mail server Replicated invocations in wide-area systems Irreproducible benchmarks might be sometimes helpful An asynchronous distributed systems platform for heterogeneous environments Dual objects—an object model for distributed system programming