Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749455
Theodore R. Haining, D. Long
Many computer hardware and software architectures buffer data in memory to improve system performance. Volatile disk or file caches are sometimes used to delay the propagation of writes to disk (called delayed writes). While delayed writes improve system performance, volatile caches can cause the loss of vital data during sudden failure. In this study, we investigate managing non-volatile RAM (NVRAM) caches with different simple strategies to delay writes to disk. We evaluate the performance of NVRAM caches using three measures of merit: the number of stalled writes which wait while the cache is cleaned before being serviced the mean service time far I/O requests, and the number of writes generated by cleaning the cache. Our results show that even small non-volatile write caches using simple management policies can reduce the number of writes to disk by at least 70% and as much as 80% in some cases. Our results also show that the number of stalled writes is high: 30% at best and nearly 100% at worst. Adding pro-active purging effectively decreases both stalled writes and disk write activity.
{"title":"Management policies for non-volatile write caches","authors":"Theodore R. Haining, D. Long","doi":"10.1109/PCCC.1999.749455","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749455","url":null,"abstract":"Many computer hardware and software architectures buffer data in memory to improve system performance. Volatile disk or file caches are sometimes used to delay the propagation of writes to disk (called delayed writes). While delayed writes improve system performance, volatile caches can cause the loss of vital data during sudden failure. In this study, we investigate managing non-volatile RAM (NVRAM) caches with different simple strategies to delay writes to disk. We evaluate the performance of NVRAM caches using three measures of merit: the number of stalled writes which wait while the cache is cleaned before being serviced the mean service time far I/O requests, and the number of writes generated by cleaning the cache. Our results show that even small non-volatile write caches using simple management policies can reduce the number of writes to disk by at least 70% and as much as 80% in some cases. Our results also show that the number of stalled writes is high: 30% at best and nearly 100% at worst. Adding pro-active purging effectively decreases both stalled writes and disk write activity.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115397779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749473
Jiandong Huang, Sejun Song, L. Li, P. Kappler, R. Freimark, J. Gustin, T. Kozlik
Presented is an open solution based approach to fault tolerant Ethernet for process control networks. This unique approach provides fault tolerance capability that requires no change of vendor hardware (Ethernet physical link and Network Interface Card) and software (Ethernet driver and protocol), yet it is transparent to control applications. The open fault tolerant Ethernet (OFTE) developed based on this approach performs failure detection and recovery for handling single point of network failure and serves regular IP traffic. Our experimentation shows that OFTE performs efficiently, achieving less than 1 ms end to end LAN swapping time and less than 2 sec failover time, and that concurrent application and system loads have little impact on the performance of failure detection and recovery operations.
{"title":"An open solution to fault-tolerant Ethernet: design, prototyping, and evaluation","authors":"Jiandong Huang, Sejun Song, L. Li, P. Kappler, R. Freimark, J. Gustin, T. Kozlik","doi":"10.1109/PCCC.1999.749473","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749473","url":null,"abstract":"Presented is an open solution based approach to fault tolerant Ethernet for process control networks. This unique approach provides fault tolerance capability that requires no change of vendor hardware (Ethernet physical link and Network Interface Card) and software (Ethernet driver and protocol), yet it is transparent to control applications. The open fault tolerant Ethernet (OFTE) developed based on this approach performs failure detection and recovery for handling single point of network failure and serves regular IP traffic. Our experimentation shows that OFTE performs efficiently, achieving less than 1 ms end to end LAN swapping time and less than 2 sec failover time, and that concurrent application and system loads have little impact on the performance of failure detection and recovery operations.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749474
T. S. Perraju
Large network systems elevate the levels of efficiency and effectiveness of organisations by improving systems integration. This improved integration of computer networks and information systems is accompanied by increased risks of intrusion and compromise. Survivability of networks and information systems in face of these risks becomes an important aspect in the effective functioning of an organisation. The unbounded nature of today's networks and the Internet make it impossible to foresee all possible risks and cover the systems against all possible attacks. In this scenario, it becomes necessary that the network be designed to resist possible attacks, recognise attacks and recover from attacks. The open ended nature of this problem requires that flexible architectures be employed to improve survivability. Agent technology is a highly flexible and robust paradigm for building large scale distributed systems. We present an agent based framework for Survivable Network Systems.
{"title":"An agent framework for survivable network systems","authors":"T. S. Perraju","doi":"10.1109/PCCC.1999.749474","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749474","url":null,"abstract":"Large network systems elevate the levels of efficiency and effectiveness of organisations by improving systems integration. This improved integration of computer networks and information systems is accompanied by increased risks of intrusion and compromise. Survivability of networks and information systems in face of these risks becomes an important aspect in the effective functioning of an organisation. The unbounded nature of today's networks and the Internet make it impossible to foresee all possible risks and cover the systems against all possible attacks. In this scenario, it becomes necessary that the network be designed to resist possible attacks, recognise attacks and recover from attacks. The open ended nature of this problem requires that flexible architectures be employed to improve survivability. Agent technology is a highly flexible and robust paradigm for building large scale distributed systems. We present an agent based framework for Survivable Network Systems.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"517 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116334318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749478
R. N. Smith, S. Bhattacharya
Firewalls are well known for their task of securing the enterprise intranet from untrusted users attempting to gain access. The concept of firewalls got its start when routers began to be used to balance network load. The effort to balance network traffic load at the transport level was extended to the server operating system where application proxy service and application level filtering is provided. Firewalls allow selected communications data to pass from one side of the corporate network perimeter to the other side. Since the firewall is the primary entry point to a corporate LAN from the Internet, the firewall frequently comes under attack by hackers and crackers. One form of attack is "denial-of-service". "Denial-of-service" attacks are easier to detect than are attacks that allow the attacker through the firewall on a valid password that they obtained by performing social engineering. Spamming the corporate email system is one form of "denial-of-service" attack, while many other forms simply flood the firewall with useless packets to prevent other authorized users from gaining access through the firewall. The paper presents a plan to place firewalls outside the corporate network boundaries, into the Internet. By having firewalls out in the Internet acting as agents for the corporations we expect to see attackers stopped closer to their source gateway. This changes the firewall task from a defensive mode to an offensive one. By having firewalls working together to seek out and locate or block the attacker at the source gateway, we gain several benefits. The paper proposes that the gateway protocol be modified to include this filtering function.
{"title":"Operating firewalls outside the LAN perimeter","authors":"R. N. Smith, S. Bhattacharya","doi":"10.1109/PCCC.1999.749478","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749478","url":null,"abstract":"Firewalls are well known for their task of securing the enterprise intranet from untrusted users attempting to gain access. The concept of firewalls got its start when routers began to be used to balance network load. The effort to balance network traffic load at the transport level was extended to the server operating system where application proxy service and application level filtering is provided. Firewalls allow selected communications data to pass from one side of the corporate network perimeter to the other side. Since the firewall is the primary entry point to a corporate LAN from the Internet, the firewall frequently comes under attack by hackers and crackers. One form of attack is \"denial-of-service\". \"Denial-of-service\" attacks are easier to detect than are attacks that allow the attacker through the firewall on a valid password that they obtained by performing social engineering. Spamming the corporate email system is one form of \"denial-of-service\" attack, while many other forms simply flood the firewall with useless packets to prevent other authorized users from gaining access through the firewall. The paper presents a plan to place firewalls outside the corporate network boundaries, into the Internet. By having firewalls out in the Internet acting as agents for the corporations we expect to see attackers stopped closer to their source gateway. This changes the firewall task from a defensive mode to an offensive one. By having firewalls working together to seek out and locate or block the attacker at the source gateway, we gain several benefits. The paper proposes that the gateway protocol be modified to include this filtering function.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114714327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749426
Anupam Goyal, M. Sundareshan
The concept of person-based number is attractive due to several pressing demands in implementation of current wireless communication systems. Some of these are: sudden proliferation in number of area codes; the need for freedom from number changes due to changes in service provider, location, or area code; and demand for improved and integrated communication service for users. Some important factors that determine the efficiency of a mobility management scheme include fewer number of times that locations need to be updated, reduced amount of overhead data that represents mobility related information, and efficient storage mechanisms that allow for fast storage and retrieval of this information. In this paper, we outline a mobility management scheme that provides a globally unique personal number and offers important benefits in regard to these issues. For an efficient implementation of such a scheme however, an investigation of its performance is required in order to estimate signaling traffic and signaling delays. This paper will focus on some recent studies conducted on an analysis of this scheme by using query and update operations as the metrics for determining average call delay and control data storage and transmission requirements. A comparative analysis is also performed with equivalent cases of query and update in the IS-41 system. This analysis provides a valuable tool to determine network management requirements in a person-based number scheme.
{"title":"Performance analysis of a person-based mobility management scheme for PCN","authors":"Anupam Goyal, M. Sundareshan","doi":"10.1109/PCCC.1999.749426","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749426","url":null,"abstract":"The concept of person-based number is attractive due to several pressing demands in implementation of current wireless communication systems. Some of these are: sudden proliferation in number of area codes; the need for freedom from number changes due to changes in service provider, location, or area code; and demand for improved and integrated communication service for users. Some important factors that determine the efficiency of a mobility management scheme include fewer number of times that locations need to be updated, reduced amount of overhead data that represents mobility related information, and efficient storage mechanisms that allow for fast storage and retrieval of this information. In this paper, we outline a mobility management scheme that provides a globally unique personal number and offers important benefits in regard to these issues. For an efficient implementation of such a scheme however, an investigation of its performance is required in order to estimate signaling traffic and signaling delays. This paper will focus on some recent studies conducted on an analysis of this scheme by using query and update operations as the metrics for determining average call delay and control data storage and transmission requirements. A comparative analysis is also performed with equivalent cases of query and update in the IS-41 system. This analysis provides a valuable tool to determine network management requirements in a person-based number scheme.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130195253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749454
P. Teller, M. Maxwell, A. Gates
Dynamic Monitoring with Integrity Constraints (DynaMICs) is a software-fault monitoring approach in which the constraints are maintained separately from the program. Since the constraints are not entwined in the code, the approach facilitates the maintenance of the application and constraint code. Through code analysis during compilation, the points at which constraint checking should occur are determined. DynaMICs minimizes performance degradation, addressing a problem that has limited the use of runtime software-fault monitoring. This paper presents the preliminary design of a DynaMICs snoopy-coprocessor system, i.e., one that employs a coprocessor that utilizes bus-monitoring hardware to facilitate the concurrent execution of the application and constraint-checking code. In this approach, the coprocessor executes the constraint-checking code while the main processor executes the application code.
{"title":"Towards the design of a snoopy coprocessor for dynamic software-fault detection","authors":"P. Teller, M. Maxwell, A. Gates","doi":"10.1109/PCCC.1999.749454","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749454","url":null,"abstract":"Dynamic Monitoring with Integrity Constraints (DynaMICs) is a software-fault monitoring approach in which the constraints are maintained separately from the program. Since the constraints are not entwined in the code, the approach facilitates the maintenance of the application and constraint code. Through code analysis during compilation, the points at which constraint checking should occur are determined. DynaMICs minimizes performance degradation, addressing a problem that has limited the use of runtime software-fault monitoring. This paper presents the preliminary design of a DynaMICs snoopy-coprocessor system, i.e., one that employs a coprocessor that utilizes bus-monitoring hardware to facilitate the concurrent execution of the application and constraint-checking code. In this approach, the coprocessor executes the constraint-checking code while the main processor executes the application code.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133685796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749470
D. Lutz, B. Kahne
Plasma is a new tool for modeling the timing of chipsets and other system components. Modeling chipsets is in some ways more difficult than modeling processors: the interfaces are more complex and more numerous, the internal queues and buffers are larger, and the traces are much more complicated. We have used Plasma to create a timing model for a modern chipset. The resulting model is fast, flexible, and useful for both design and verification.
{"title":"Performance analysis for chipsets and systems","authors":"D. Lutz, B. Kahne","doi":"10.1109/PCCC.1999.749470","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749470","url":null,"abstract":"Plasma is a new tool for modeling the timing of chipsets and other system components. Modeling chipsets is in some ways more difficult than modeling processors: the interfaces are more complex and more numerous, the internal queues and buffers are larger, and the traces are much more complicated. We have used Plasma to create a timing model for a modern chipset. The resulting model is fast, flexible, and useful for both design and verification.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127650192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749413
Sumit Roy, V. Chaudhary
Clusters of Symmetrical Multiprocessors (SMPs) have recently become popular as low-cost, high-performance computing solutions. The type of interconnection hardware used in these clusters can become a deciding factor in their overall performance. This paper evaluates the performance of three different communication systems, 10 Mbps Ethernet, 100 Mbps FastEthernet and 155 Mbps ATM, using a multithreaded Distributed Shared Memory system, Strings. The raw performance of each network is first measured using netperf. Ten different applications are then used for performance evaluation, including programs from the SPLASH-2 benchmarks, a medical computing application, and some computational kernels. It is found that half of the programs tested are not significantly affected by changes in the bandwidth. Though the ATM network provides the highest overall bandwidth, the remaining applications show that the increase in latency compared to FastEthernet prevents any performance improvement. On the other hand, applications that require only moderately high bandwidths perform substantially better with FastEthernet.
{"title":"Evaluation of cluster interconnects for a distributed shared memory","authors":"Sumit Roy, V. Chaudhary","doi":"10.1109/PCCC.1999.749413","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749413","url":null,"abstract":"Clusters of Symmetrical Multiprocessors (SMPs) have recently become popular as low-cost, high-performance computing solutions. The type of interconnection hardware used in these clusters can become a deciding factor in their overall performance. This paper evaluates the performance of three different communication systems, 10 Mbps Ethernet, 100 Mbps FastEthernet and 155 Mbps ATM, using a multithreaded Distributed Shared Memory system, Strings. The raw performance of each network is first measured using netperf. Ten different applications are then used for performance evaluation, including programs from the SPLASH-2 benchmarks, a medical computing application, and some computational kernels. It is found that half of the programs tested are not significantly affected by changes in the bandwidth. Though the ATM network provides the highest overall bandwidth, the remaining applications show that the increase in latency compared to FastEthernet prevents any performance improvement. On the other hand, applications that require only moderately high bandwidths perform substantially better with FastEthernet.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"6 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120856441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749431
Mehul J. Shah, P. Flikkema
In this paper, we develop a physical layer-based framework for the organization of an ad-hoc wireless network. The focus is quasi-static environments, such as multimedia classrooms, and situations characterized by real-time services and high traffic loads. In these cases, centralized control may be preferable due to its simplicity and high efficiency. Using a link loss matrix, an approach is proposed for selection of a network leader that takes into consideration link losses and transmitter powers. Both minimax and minisum criteria are developed to determine uplink, downlink, and overall leaders. A QoS-based iterative algorithm for determination of link transmit powers is also proposed, from which leader selection is obtained as a special case. Numerical results are also presented which explore the effect of quantization of feedback information on the iterative algorithm.
{"title":"Power-based leader selection in ad-hoc wireless networks","authors":"Mehul J. Shah, P. Flikkema","doi":"10.1109/PCCC.1999.749431","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749431","url":null,"abstract":"In this paper, we develop a physical layer-based framework for the organization of an ad-hoc wireless network. The focus is quasi-static environments, such as multimedia classrooms, and situations characterized by real-time services and high traffic loads. In these cases, centralized control may be preferable due to its simplicity and high efficiency. Using a link loss matrix, an approach is proposed for selection of a network leader that takes into consideration link losses and transmitter powers. Both minimax and minisum criteria are developed to determine uplink, downlink, and overall leaders. A QoS-based iterative algorithm for determination of link transmit powers is also proposed, from which leader selection is obtained as a special case. Numerical results are also presented which explore the effect of quantization of feedback information on the iterative algorithm.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128624430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-02-10DOI: 10.1109/PCCC.1999.749469
J. Tyler, Jeff Lent, Anh Mather, Huy Nguyen
Motorola's AltiVec/sup TM/ Technology provides a new, SIMD vector extension to the PowerPC/sup TM/ architecture. AltiVec adds 162 new instructions and a powerful new 128-bit datapath, capable of simultaneously executing up to 16 operations per clock. AltiVec instructions allow parallel operation on either 8, 16 or 32-bit integers, as well as 4 IEEE single-precision floating-point numbers. AltiVec technology includes highly flexible "Permute" instructions, which give the data re-organization power needed to maintain a high level of data parallelism. Fine grained data prefetch instructions are also included, which help hide the memory latency of data hungry multimedia applications. All of these features add up to a dramatic performance improvement with the first implementation of AltiVec technology: routines written with AltiVec instructions can execute significantly faster sometimes by a factor of 10 or more, than traditional scalar PowerPC code. Yet AltiVec technology is flexible enough to be useful in a wide variety of applications.
{"title":"AltiVec/sup TM/: bringing vector technology to the PowerPC/sup TM/ processor family","authors":"J. Tyler, Jeff Lent, Anh Mather, Huy Nguyen","doi":"10.1109/PCCC.1999.749469","DOIUrl":"https://doi.org/10.1109/PCCC.1999.749469","url":null,"abstract":"Motorola's AltiVec/sup TM/ Technology provides a new, SIMD vector extension to the PowerPC/sup TM/ architecture. AltiVec adds 162 new instructions and a powerful new 128-bit datapath, capable of simultaneously executing up to 16 operations per clock. AltiVec instructions allow parallel operation on either 8, 16 or 32-bit integers, as well as 4 IEEE single-precision floating-point numbers. AltiVec technology includes highly flexible \"Permute\" instructions, which give the data re-organization power needed to maintain a high level of data parallelism. Fine grained data prefetch instructions are also included, which help hide the memory latency of data hungry multimedia applications. All of these features add up to a dramatic performance improvement with the first implementation of AltiVec technology: routines written with AltiVec instructions can execute significantly faster sometimes by a factor of 10 or more, than traditional scalar PowerPC code. Yet AltiVec technology is flexible enough to be useful in a wide variety of applications.","PeriodicalId":211210,"journal":{"name":"1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130899559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}