{"title":"Session details: Technical Session I","authors":"A. Sim","doi":"10.1145/3341229","DOIUrl":"https://doi.org/10.1145/3341229","url":null,"abstract":"","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific workflows are complex, often generating large amounts of data that need to be processed in multiple stages. The data often generated at remote locations must be transferred from the source and between the distributed HPC nodes interconnected by high-speed networks that carry other background traffic. Increasingly, many of these scientific workflows require processing to be completed within a deadline, which, in turn, imposes deadline on the network data transfer. A recent example of a deadline-driven workflow occurred when LIGO and Virgo detectors observed a gravitational wave signal associated with the merger of two neutron stars. The merger, known as a kilonova, occurred in a galaxy 130 million light-years from Earth in the southern constellation of Hydra. The data from this initial observation had to be processed in a timely manner and sent to astronomers around the world so that they could aim their instruments to the right section of the sky to image the source of the signal.
{"title":"Performance and Security Challenges in Science Workflows","authors":"D. Ghosal","doi":"10.1145/3322798.3329260","DOIUrl":"https://doi.org/10.1145/3322798.3329260","url":null,"abstract":"Scientific workflows are complex, often generating large amounts of data that need to be processed in multiple stages. The data often generated at remote locations must be transferred from the source and between the distributed HPC nodes interconnected by high-speed networks that carry other background traffic. Increasingly, many of these scientific workflows require processing to be completed within a deadline, which, in turn, imposes deadline on the network data transfer. A recent example of a deadline-driven workflow occurred when LIGO and Virgo detectors observed a gravitational wave signal associated with the merger of two neutron stars. The merger, known as a kilonova, occurred in a galaxy 130 million light-years from Earth in the southern constellation of Hydra. The data from this initial observation had to be processed in a timely manner and sent to astronomers around the world so that they could aim their instruments to the right section of the sky to image the source of the signal.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133677096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Year after year, network traffic keeps reaching new highs as unprecedented volumes of data flow between an ever-increasing number of sources and destinations. As part of the Analysis on the Wire project, we aim to develop a network-centric approach for streaming data processing to facilitate scientific data analysis and reduce the overhead in sending big data to a data center. This work discusses our design for programmable network devices to augment network capabilities for streaming data processing, including efforts in progress at Brookhaven National Laboratory and the challenges faced to date.
年复一年,网络流量不断创新高,数据流量在越来越多的源和目的之间流动。作为Analysis on the Wire项目的一部分,我们的目标是开发一种以网络为中心的流数据处理方法,以促进科学数据分析,并减少将大数据发送到数据中心的开销。这项工作讨论了我们的可编程网络设备的设计,以增强流数据处理的网络功能,包括布鲁克海文国家实验室正在进行的工作和迄今为止面临的挑战。
{"title":"A Software Defined Network Design for Analyzing Streaming Data in Transit","authors":"Y. Liu, D. Katramatos","doi":"10.1145/3322798.3329257","DOIUrl":"https://doi.org/10.1145/3322798.3329257","url":null,"abstract":"Year after year, network traffic keeps reaching new highs as unprecedented volumes of data flow between an ever-increasing number of sources and destinations. As part of the Analysis on the Wire project, we aim to develop a network-centric approach for streaming data processing to facilitate scientific data analysis and reduce the overhead in sending big data to a data center. This work discusses our design for programmable network devices to augment network capabilities for streaming data processing, including efforts in progress at Brookhaven National Laboratory and the challenges faced to date.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128943161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngsoo Kim, Jonghyun Kim, Ikkyun Kim, Hyunchul Kim
Tracing is a form of logging by recording the execution information of programs. Since a large amount of data must be created and decoded in real time, a tracer composed mainly of dedicated hardware is widely used. Intel® PT records all information related to software execution from each hardware thread. When the execution of the corresponding software is completed, the accurate program flow can be indicated through the recorded trace data. The hardware trace program can be integrated into the operating system, but in the case of the Windows system, the kernel is not disclosed so tight integration is not achieved. Also, in a Windows environment, it can only trace a single process and do not provide a way to trace multiple process streams. In this paper, we propose a way of extending the PT trace program in order to overcome this shortcoming by supporting multi-process stream tracing in Windows environment.
{"title":"Real-time Multi-process Tracing Decoder Architecture","authors":"Youngsoo Kim, Jonghyun Kim, Ikkyun Kim, Hyunchul Kim","doi":"10.1145/3322798.3329253","DOIUrl":"https://doi.org/10.1145/3322798.3329253","url":null,"abstract":"Tracing is a form of logging by recording the execution information of programs. Since a large amount of data must be created and decoded in real time, a tracer composed mainly of dedicated hardware is widely used. Intel® PT records all information related to software execution from each hardware thread. When the execution of the corresponding software is completed, the accurate program flow can be indicated through the recorded trace data. The hardware trace program can be integrated into the operating system, but in the case of the Windows system, the kernel is not disclosed so tight integration is not achieved. Also, in a Windows environment, it can only trace a single process and do not provide a way to trace multiple process streams. In this paper, we propose a way of extending the PT trace program in order to overcome this shortcoming by supporting multi-process stream tracing in Windows environment.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125390637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hemanta Sapkota, Bahadir A. Pehlivan, Engin Arslan
Real-time transfer optimization approaches offer promising solutions as they can discover optimal transfer configuration in the runtime without requiring an upfront work or making assumptions about underlying system architectures. On the other hand, existing implementations suffer from slow convergence speed due to running many sample transfers with suboptimal configurations. In this work, we evaluate time-series models to minimize the impact of sample transfers with suboptimal configurations by shortening the transfer duration without degrading the accuracy. The results gathered in various networks with rich set of transfer configurations indicate that, in most cases, Autoregressive model can accurately estimate sample transfer throughput in less than 5 seconds which is up-to 4x improvement over the state-of-the-art solution. We also realized that while the most common transfer applications report transfer throughput at most once a second, decreasing the reporting interval is the key to further reduce the impact of sample transfers by quickly determining their performance.
{"title":"Time Series Analysis for Efficient Sample Transfers","authors":"Hemanta Sapkota, Bahadir A. Pehlivan, Engin Arslan","doi":"10.1145/3322798.3329256","DOIUrl":"https://doi.org/10.1145/3322798.3329256","url":null,"abstract":"Real-time transfer optimization approaches offer promising solutions as they can discover optimal transfer configuration in the runtime without requiring an upfront work or making assumptions about underlying system architectures. On the other hand, existing implementations suffer from slow convergence speed due to running many sample transfers with suboptimal configurations. In this work, we evaluate time-series models to minimize the impact of sample transfers with suboptimal configurations by shortening the transfer duration without degrading the accuracy. The results gathered in various networks with rich set of transfer configurations indicate that, in most cases, Autoregressive model can accurately estimate sample transfer throughput in less than 5 seconds which is up-to 4x improvement over the state-of-the-art solution. We also realized that while the most common transfer applications report transfer throughput at most once a second, decreasing the reporting interval is the key to further reduce the impact of sample transfers by quickly determining their performance.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116523647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Technical Session II","authors":"A. Lazar","doi":"10.1145/3341230","DOIUrl":"https://doi.org/10.1145/3341230","url":null,"abstract":"","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127670279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanul Sung, Jiwoo Bang, A. Sim, Kesheng Wu, Hyeonsang Eom
In high-performance computing (HPC) environments, an appropriate amount of hardware resources must be used for the best parallel I/O performance. For this reason, HPC users are provided with tunable parameters to change the HPC configurations, which control the amounts of resources. However, some users are not well aware of a relationship between the parallel I/O performance and the HPC configuration, and they thus fail to utilize these parameters. Even if users who know the relationship, they have to run an application under every parameter combination to find the setting for the best performance, because each application shows different performance trends under different configurations. The paper shows the result of analyzing the I/O performance trends for HPC users to find the best configurations with minimal efforts. We divide the parallel I/O characteristic into independent and collective I/Os and measure the I/O throughput under various configurations by using synthetic workload, IOR benchmark. Through the analysis, we have figured out that the parallel I/O performance is determined by the trade-off between the gain from the parallelism of increased OSTs and the loss from the contention for shared resources. Also, this performance trend differs depending on the I/O characteristic. Our evaluation shows that HPC applications also have similar performance trends as our analysis.
{"title":"Understanding Parallel I/O Performance Trends Under Various HPC Configurations","authors":"Hanul Sung, Jiwoo Bang, A. Sim, Kesheng Wu, Hyeonsang Eom","doi":"10.1145/3322798.3329258","DOIUrl":"https://doi.org/10.1145/3322798.3329258","url":null,"abstract":"In high-performance computing (HPC) environments, an appropriate amount of hardware resources must be used for the best parallel I/O performance. For this reason, HPC users are provided with tunable parameters to change the HPC configurations, which control the amounts of resources. However, some users are not well aware of a relationship between the parallel I/O performance and the HPC configuration, and they thus fail to utilize these parameters. Even if users who know the relationship, they have to run an application under every parameter combination to find the setting for the best performance, because each application shows different performance trends under different configurations. The paper shows the result of analyzing the I/O performance trends for HPC users to find the best configurations with minimal efforts. We divide the parallel I/O characteristic into independent and collective I/Os and measure the I/O throughput under various configurations by using synthetic workload, IOR benchmark. Through the analysis, we have figured out that the parallel I/O performance is determined by the trade-off between the gain from the parallelism of increased OSTs and the loss from the contention for shared resources. Also, this performance trend differs depending on the I/O characteristic. Our evaluation shows that HPC applications also have similar performance trends as our analysis.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124799861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengtian Jin, Youkow Homma, A. Sim, W. Kroeger, Kesheng Wu
In this work, we study the use of decision tree-based models to predict the transfer rates in different parts of the data pipeline that sends experiment data from Linac Coherent Light Source (LCLS) at SLAC National Accelerator Laboratory (SLAC) to National Energy Research Scientific Computing Center (NERSC). The system monitoring the data pipeline collects a number of characteristics such as the file size, source file system, start time and so on, all of which are known at the start of the file transfer. However, these static variables do not capture the dynamic information such as current state of the networking system. In this work, we explore a number of different ways to capture the state of the network and other dynamic information. We find that in addition to using static features, using these dynamic features can improve the transfer performance predictions by up to 10-15%. We additionally study a couple of different well-known decision-tree based models and find that Gradient-Tree Boosting algorithm performs better overall.
{"title":"Performance Prediction for Data Transfers in LCLS Workflow","authors":"Mengtian Jin, Youkow Homma, A. Sim, W. Kroeger, Kesheng Wu","doi":"10.1145/3322798.3329254","DOIUrl":"https://doi.org/10.1145/3322798.3329254","url":null,"abstract":"In this work, we study the use of decision tree-based models to predict the transfer rates in different parts of the data pipeline that sends experiment data from Linac Coherent Light Source (LCLS) at SLAC National Accelerator Laboratory (SLAC) to National Energy Research Scientific Computing Center (NERSC). The system monitoring the data pipeline collects a number of characteristics such as the file size, source file system, start time and so on, all of which are known at the start of the file transfer. However, these static variables do not capture the dynamic information such as current state of the networking system. In this work, we explore a number of different ways to capture the state of the network and other dynamic information. We find that in addition to using static features, using these dynamic features can improve the transfer performance predictions by up to 10-15%. We additionally study a couple of different well-known decision-tree based models and find that Gradient-Tree Boosting algorithm performs better overall.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126720202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Astha Syal, A. Lazar, Jinoh Kim, A. Sim, Kesheng Wu
Accurately predicting network behavior is beneficial for TCP congestion control, and can help improve routing, allocating network resources, and optimizing network designs.This task is challenging because many factors could affect network traffic, such as the number of network sessions and synthetic reordering. There are also many ways to measure the network state, such as the number of retransmissions per flow and packet duplication. For this work, we use a set of passive TCP flow measurements collected at a major computer center on multiple data transfer nodes (DTN). To assist the operations of the computer network, we propose to detect abnormally slow network transfers in real-time. The proposed system breaks the network monitoring logs into fixed-size chunks and employs a state of art classifier to identify the slow time windows. This method will be validated on real large datasets collected from several DTNs. The proposed method is able to generate models to quickly detect large intervals of low performing network transfers, which require attention from network engineers.
{"title":"Automatic Detection of Network Traffic Anomalies and Changes","authors":"Astha Syal, A. Lazar, Jinoh Kim, A. Sim, Kesheng Wu","doi":"10.1145/3322798.3329255","DOIUrl":"https://doi.org/10.1145/3322798.3329255","url":null,"abstract":"Accurately predicting network behavior is beneficial for TCP congestion control, and can help improve routing, allocating network resources, and optimizing network designs.This task is challenging because many factors could affect network traffic, such as the number of network sessions and synthetic reordering. There are also many ways to measure the network state, such as the number of retransmissions per flow and packet duplication. For this work, we use a set of passive TCP flow measurements collected at a major computer center on multiple data transfer nodes (DTN). To assist the operations of the computer network, we propose to detect abnormally slow network transfers in real-time. The proposed system breaks the network monitoring logs into fixed-size chunks and employs a state of art classifier to identify the slow time windows. This method will be validated on real large datasets collected from several DTNs. The proposed method is able to generate models to quickly detect large intervals of low performing network transfers, which require attention from network engineers.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"419 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132554581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivia Del Guercio, Rafael Orozco, A. Sim, Kesheng Wu
Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.
{"title":"Similarity-based Compression with Multidimensional Pattern Matching","authors":"Olivia Del Guercio, Rafael Orozco, A. Sim, Kesheng Wu","doi":"10.1145/3322798.3329252","DOIUrl":"https://doi.org/10.1145/3322798.3329252","url":null,"abstract":"Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}