{"title":"A Unified Trace Environment for IBM SP systems","authors":"C. Wu, H. Franke, Yew-Huey Liu","doi":"10.1109/M-PDT.1996.494613","DOIUrl":null,"url":null,"abstract":"C. Eric Wu, Hubertus Franke, and Yew-Huey Liu IBM T J. Watson Research Center Distributed parallel processing can increase system computing power beyond the limits of current uniprocessor technology. However, programming in such a system based on the message-passing programming model is much more complex than writing sequential programs. To take advantage of the underlying hardware, understanding the communication behavior of parallel programs and system responses to user applications is extremely critical. One common way of monitoring a program’s behavior is to generate trace events while executing the program. Events generated can then be used for other purposes such as debugging and program visualization. However, as we’ll see, such a method potentially requires source code modification, increases overhead, and causes clocksynchronization problems. T o meet these challenges, we developed a Unified Trace Environment for IBM SP systems. The user-level U T E trace libraries require only relinking for generating message-passing and system events. With the UTE, users can generate message-passing events with minimum overhead, and mark specific portions of the program, such as various phases, loops, and routines, for performance analysis and visualization. Most user-level trace tools for messagepassing systems require source code modification to collect message-passing events. More advanced tools such as the Paradyn systeml require no source code modification; they insert the code for performance instrumentation into an application program during execution. However, instrumentation daemons cause substantial overhead. Collecting system events is as important as collecting message-passing events. System and I/O events such as process dispatch and page fault can reveal crucial information on system responses to user applications. The trace facility should also easily expand to trace activities from other software layers, such as parallel I/O file systems and high-level parallel languages. Such expandability enables the same trace facility to trace multiple software systems. One of the most serious problems in trace analysis for distributed parallel systems is clock synchronization. In such a system, multiple processors generate trace records, and often multiple nodes produce separate streams independently. The logical order of events might not be guaranteed in the trace because of discrepancies among local clocks. As a result, many trace facilities must do additional work to ensure consistent time stamps, thus increasing trace overhead. The challenges of trace analysis","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Parallel & Distributed Technology: Systems & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M-PDT.1996.494613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
C. Eric Wu, Hubertus Franke, and Yew-Huey Liu IBM T J. Watson Research Center Distributed parallel processing can increase system computing power beyond the limits of current uniprocessor technology. However, programming in such a system based on the message-passing programming model is much more complex than writing sequential programs. To take advantage of the underlying hardware, understanding the communication behavior of parallel programs and system responses to user applications is extremely critical. One common way of monitoring a program’s behavior is to generate trace events while executing the program. Events generated can then be used for other purposes such as debugging and program visualization. However, as we’ll see, such a method potentially requires source code modification, increases overhead, and causes clocksynchronization problems. T o meet these challenges, we developed a Unified Trace Environment for IBM SP systems. The user-level U T E trace libraries require only relinking for generating message-passing and system events. With the UTE, users can generate message-passing events with minimum overhead, and mark specific portions of the program, such as various phases, loops, and routines, for performance analysis and visualization. Most user-level trace tools for messagepassing systems require source code modification to collect message-passing events. More advanced tools such as the Paradyn systeml require no source code modification; they insert the code for performance instrumentation into an application program during execution. However, instrumentation daemons cause substantial overhead. Collecting system events is as important as collecting message-passing events. System and I/O events such as process dispatch and page fault can reveal crucial information on system responses to user applications. The trace facility should also easily expand to trace activities from other software layers, such as parallel I/O file systems and high-level parallel languages. Such expandability enables the same trace facility to trace multiple software systems. One of the most serious problems in trace analysis for distributed parallel systems is clock synchronization. In such a system, multiple processors generate trace records, and often multiple nodes produce separate streams independently. The logical order of events might not be guaranteed in the trace because of discrepancies among local clocks. As a result, many trace facilities must do additional work to ensure consistent time stamps, thus increasing trace overhead. The challenges of trace analysis
C. Eric Wu, Hubertus Franke和Yew-Huey Liu IBM T J. Watson研究中心分布式并行处理可以提高系统计算能力,超越当前单处理器技术的限制。然而,在这样一个基于消息传递编程模型的系统中编程要比编写顺序程序复杂得多。为了利用底层硬件,理解并行程序的通信行为和系统对用户应用程序的响应是非常关键的。监视程序行为的一种常用方法是在执行程序时生成跟踪事件。然后生成的事件可用于其他目的,例如调试和程序可视化。然而,正如我们将看到的,这样的方法可能需要修改源代码,增加开销,并导致时钟同步问题。为了应对这些挑战,我们为IBM SP系统开发了统一跟踪环境。用户级U - T - E跟踪库只需要在生成消息传递和系统事件时进行重链接。使用UTE,用户可以以最小的开销生成消息传递事件,并标记程序的特定部分,例如各个阶段、循环和例程,以便进行性能分析和可视化。大多数用于消息传递系统的用户级跟踪工具都需要修改源代码来收集消息传递事件。更高级的工具,如Paradyn系统,不需要修改源代码;它们在执行期间将性能检测代码插入到应用程序中。但是,检测守护进程会导致大量的开销。收集系统事件与收集消息传递事件同样重要。系统和I/O事件(如进程调度和页面故障)可以揭示系统对用户应用程序响应的关键信息。跟踪功能还应该很容易地扩展到跟踪来自其他软件层的活动,比如并行I/O文件系统和高级并行语言。这种可扩展性使同一个跟踪工具能够跟踪多个软件系统。时钟同步是分布式并行系统跟踪分析中最严重的问题之一。在这样的系统中,多个处理器生成跟踪记录,并且通常多个节点独立地生成单独的流。由于本地时钟之间存在差异,在跟踪中可能无法保证事件的逻辑顺序。因此,许多跟踪工具必须做额外的工作来确保一致的时间戳,从而增加了跟踪开销。痕量分析的挑战