Kajornsak Piyoungkorn, Phithak Thaenkaew, C. Vorakulpipat
High performance computing has been more important in the past decade. In the present day, data used for processing becomes enormous. Where a high performance computing resource is needed to help process the data. Some scientific experiments involving big data. Which requires high speed data processing cannot be done by an ordinary computer system. Also, there is a need for support of parallel processing. The solution starts by dividing the job into a number of sections to be processed into parts and the processing unit each processing unit of data at the same time. Then, the system sends the calculated result back to the compiled. This mechanism will speed up the processing time to complete the task and generate more output at the same time. Therefore, a solution in this study is to maximize efficiency when using the resources of the computer which involves the processing power of the processor (CPU Cores).When the HPC system has a large number of concurrent users and requests processing resources that do not match the actual usage. Therefore requires a system to detect job requests that use inefficient computing resources to help users and system administrators to work effectively.
{"title":"A Resource-saving Job Monitoring System of High Performance Computing using Parent and Child Process","authors":"Kajornsak Piyoungkorn, Phithak Thaenkaew, C. Vorakulpipat","doi":"10.22323/1.351.0034","DOIUrl":"https://doi.org/10.22323/1.351.0034","url":null,"abstract":"High performance computing has been more important in the past decade. In the present day, data used for processing becomes enormous. Where a high performance computing resource is needed to help process the data. Some scientific experiments involving big data. Which requires high speed data processing cannot be done by an ordinary computer system. Also, there is a need for support of parallel processing. The solution starts by dividing the job into a number of sections to be processed into parts and the processing unit each processing unit of data at the same time. Then, the system sends the calculated result back to the compiled. This mechanism will speed up the processing time to complete the task and generate more output at the same time. Therefore, a solution in this study is to maximize efficiency when using the resources of the computer which involves the processing power of the processor (CPU Cores).When the HPC system has a large number of concurrent users and requests processing resources that do not match the actual usage. Therefore requires a system to detect job requests that use inefficient computing resources to help users and system administrators to work effectively.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125848819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Sakane, Takeshi Nishimura, K. Aida, Motonori Nakamura
This paper investigates a mechanism that establishes single sign-on for inter-cloud computing environment built as the optimized result of the needs of users. Arranging requirements and issues for the mechanism, a single sign-on system for an inter-cloud computing environment is presented. As concrete service in the inter-cloud environment, we deal with Amazon Web Service with SAML version 2.0 and implement a prototype system. We also evaluate the prototype implementation and consider applicability to the other services.
本文研究了基于用户需求优化构建的跨云计算环境的单点登录机制。针对该机制的需求和问题,提出了一种跨云计算环境的单点登录系统。作为跨云环境中的具体服务,我们使用SAML 2.0版本来处理Amazon Web service,并实现了一个原型系统。我们还评估原型实现并考虑对其他服务的适用性。
{"title":"Toward Single Sign-on Establishment for Inter-Cloud Environment","authors":"E. Sakane, Takeshi Nishimura, K. Aida, Motonori Nakamura","doi":"10.22323/1.351.0028","DOIUrl":"https://doi.org/10.22323/1.351.0028","url":null,"abstract":"This paper investigates a mechanism that establishes single sign-on for inter-cloud computing environment built as the optimized result of the needs of users. Arranging requirements and issues for the mechanism, a single sign-on system for an inter-cloud computing environment is presented. As concrete service in the inter-cloud environment, we deal with Amazon Web Service with SAML version 2.0 and implement a prototype system. We also evaluate the prototype implementation and consider applicability to the other services.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132346927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-21DOI: 10.1051/epjconf/201921403006
T. Ivanov, S. Belforte, M. Wolf, M. Mascheroni, A. P. Yzquierdo, J. Letts, J. Hernández, L. Cristella, D. Ciangottini, J. Balcas, A. Woodard, K. H. Anampa, B. Bockelman, D. Foyo
Hundreds of physicists analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider using the CMS Remote Analysis Builder and the CMS global pool to exploit the resources of the Worldwide LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time, the CMS collaboration is committed to minimizing time to insight for every scientist, by pushing for fewer possible access restrictions to the full data sample and supports the free choice of applications to run on the computing resources. Supporting such variety of workflows while preserving efficient resource usage poses special challenges. In this paper we report on three complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: automatic job splitting, automated run time estimates and automated site selection for jobs.
{"title":"Improving efficiency of analysis jobs in CMS","authors":"T. Ivanov, S. Belforte, M. Wolf, M. Mascheroni, A. P. Yzquierdo, J. Letts, J. Hernández, L. Cristella, D. Ciangottini, J. Balcas, A. Woodard, K. H. Anampa, B. Bockelman, D. Foyo","doi":"10.1051/epjconf/201921403006","DOIUrl":"https://doi.org/10.1051/epjconf/201921403006","url":null,"abstract":"Hundreds of physicists analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider using the CMS Remote Analysis Builder and the CMS global pool to exploit the resources of the Worldwide LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time, the CMS collaboration is committed to minimizing time to insight for every scientist, by pushing for fewer possible access restrictions to the full data sample and supports the free choice of applications to run on the computing resources. Supporting such variety of workflows while preserving efficient resource usage poses special challenges. In this paper we report on three complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: automatic job splitting, automated run time estimates and automated site selection for jobs.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126371743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Kishimoto, J. Tanaka, T. Mashimo, M. Kaneda, N. Matsui
The Tokyo Tier-2 center, which is located in the International Center for Elementary Particle Physics at the University of Tokyo, provides computing resources to the ATLAS experiment in the Worldwide LHC Computing Grid. In order to improve the I/O performance and scalability of file servers in the future system, a possibility of introducing a cache system using fast devices such as SSD is under discussion. Therefore, a simulation has been performed to understand the cache behavior using past data access logs at the center. This paper reports a method of the simulation and gives a discussion about its results.
{"title":"Simulation of the cache hit rate for data readout at the Tokyo Tier-2 center","authors":"T. Kishimoto, J. Tanaka, T. Mashimo, M. Kaneda, N. Matsui","doi":"10.22323/1.351.0030","DOIUrl":"https://doi.org/10.22323/1.351.0030","url":null,"abstract":"The Tokyo Tier-2 center, which is located in the International Center for Elementary Particle Physics at the University of Tokyo, provides computing resources to the ATLAS experiment in the Worldwide LHC Computing Grid. In order to improve the I/O performance and scalability of file servers in the future system, a possibility of introducing a cache system using fast devices such as SSD is under discussion. Therefore, a simulation has been performed to understand the cache behavior using past data access logs at the center. This paper reports a method of the simulation and gives a discussion about its results.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The modern security landscape affecting grid and cloud sites is constantly evolving, with threats being seen from a range of avenues, including social engineering as well as more direct approaches. It is vital to build up operational security capabilities across the Worldwide LHC Computing Grid (WLCG) in order to improve the defence of the community as a whole. As reported at ISGC 2017 and 2018, the WLCG Security Operations Centres (SOC) Working Group (WG) has been working with sites across the WLCG to develop a model for a Security Operations Centre reference design. We present the current status of a minimum viable SOC design applicable to a range of different WLCG sites, centred around a few key components. The design uses the Zeek Network Intrusion Detection System for monitoring what is happening at the network level in strategic locations: for example at border between the local cluster and external networks, the border between different local network domains or at core infrastructure nodes. The MISP Open Source Threat Intelligence Platform is used to share information regarding relevant security events and the associated Indicators of Compromise (IoCs). By feeding IoCs from MISP into Zeek we have a platform that allows the community to share threat intelligence that is immediately actionable across the entire grid. The logs produced by Zeek are processed using the Elasticsearch, Logstash, Kibana (Elastic) stack for real time indexing and visualisation. This provides sites with a powerful tool for incident response and network forensics. The alerts raised by Zeek are further aggregated, correlated and enriched by an advanced notification processing engine. This ensures that most false positives are automatically whitelisted while at the same time reducing the total number of raised alerts that need to be managed by the computer security team of each site. By enriching these alerts and adding context of what happened around the moment the malicious activity was detected, the time needed to handle these alerts is greatly reduced. We present possible deployment strategies for all these components in a grid context as well as the integration between them. We also report on the current status of work on integrating other sources of data, in particular using netflow / sflow, into this model. Lastly we discuss how making use of these SOC capabilities distributed across the participating sites can lead to increasing the operational security across the entire grid.
影响网格和云站点的现代安全环境正在不断发展,威胁可以从一系列途径看到,包括社会工程和更直接的方法。为了提高整个社区的防御能力,在全球大型强子对撞机计算网格(WLCG)上建立操作安全能力至关重要。据ISGC 2017和2018报道,WLCG安全运营中心(SOC)工作组(WG)一直在与WLCG各站点合作,为安全运营中心参考设计开发模型。我们提出了适用于一系列不同WLCG站点的最小可行SOC设计的当前状态,以几个关键组件为中心。该设计使用Zeek网络入侵检测系统来监控战略位置的网络级别发生的情况:例如在本地集群和外部网络之间的边界,不同本地网络域之间的边界或核心基础设施节点。MISP开源威胁情报平台用于共享相关安全事件和相关的ioc (Indicators of Compromise)信息。通过将来自MISP的ioc提供给Zeek,我们有了一个平台,允许社区在整个电网中共享可立即采取行动的威胁情报。Zeek生成的日志使用Elasticsearch、Logstash、Kibana (Elastic)堆栈进行处理,以实现实时索引和可视化。这为站点提供了一个用于事件响应和网络取证的强大工具。Zeek发出的警报通过高级通知处理引擎进一步聚合、关联和丰富。这确保了大多数误报被自动列入白名单,同时减少了需要由每个站点的计算机安全团队管理的警报总数。通过丰富这些警报并添加检测到恶意活动前后发生的情况的上下文,可以大大减少处理这些警报所需的时间。我们提出了网格环境中所有这些组件的可能部署策略,以及它们之间的集成。我们还报告了将其他数据源(特别是使用netflow / sflow)集成到该模型中的工作的当前状态。最后,我们讨论了如何利用分布在参与站点上的这些SOC功能来提高整个电网的运行安全性。
{"title":"Building a minimum viable Security Operations Centre for the modern grid environment","authors":"D. Crooks, L. Valsan","doi":"10.22323/1.351.0010","DOIUrl":"https://doi.org/10.22323/1.351.0010","url":null,"abstract":"The modern security landscape affecting grid and cloud sites is constantly evolving, with threats being seen from a range of avenues, including social engineering as well as more direct approaches. It is vital to build up operational security capabilities across the Worldwide LHC Computing Grid (WLCG) in order to improve the defence of the community as a whole. As reported at ISGC 2017 and 2018, the WLCG Security Operations Centres (SOC) Working Group (WG) has been working with sites across the WLCG to develop a model for a Security Operations Centre reference design. We present the current status of a minimum viable SOC design applicable to a range of different WLCG sites, centred around a few key components. \u0000 \u0000The design uses the Zeek Network Intrusion Detection System for monitoring what is happening at the network level in strategic locations: for example at border between the local cluster and external networks, the border between different local network domains or at core infrastructure nodes. The MISP Open Source Threat Intelligence Platform is used to share information regarding relevant security events and the associated Indicators of Compromise (IoCs). By feeding IoCs from MISP into Zeek we have a platform that allows the community to share threat intelligence that is immediately actionable across the entire grid. \u0000 \u0000The logs produced by Zeek are processed using the Elasticsearch, Logstash, Kibana (Elastic) stack for real time indexing and visualisation. This provides sites with a powerful tool for incident response and network forensics. The alerts raised by Zeek are further aggregated, correlated and enriched by an advanced notification processing engine. This ensures that most false positives are automatically whitelisted while at the same time reducing the total number of raised alerts that need to be managed by the computer security team of each site. By enriching these alerts and adding context of what happened around the moment the malicious activity was detected, the time needed to handle these alerts is greatly reduced. \u0000 \u0000We present possible deployment strategies for all these components in a grid context as well as the integration between them. We also report on the current status of work on integrating other sources of data, in particular using netflow / sflow, into this model. \u0000 \u0000Lastly we discuss how making use of these SOC capabilities distributed across the participating sites can lead to increasing the operational security across the entire grid.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131714912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yining Zhao, Xiaodong Wang, Haili Xiao, Xue-bin Chi
Distributed systems have kept scaling upward since this concept appears, and they soon evolve to environments that contain heterogeneous components playing different roles, making it difficult to understand how the large environment works or if any undesired matters happened from security point of view. Logs, produced by devices, sub-systems and running processes, are a very important source to help system maintainers to get relative security knowledge. But there are too many logs and too many kinds of logs to deal with, which makes manual checking impossible. In this work we will share some of our experiences in log processing and analyzing. We have summarized some common major steps that appear in most of the existing log analysis approaches, including log selection, log classification, information analyses and result feedback. We also represent a general framework that monitors events, analyzes hidden information and diagnoses the healthy state for large distributed computing environments bases on logs. Although the framework we initially designed was for the maintenance for CNGrid, its process is adaptable to other distributed computing environments.
{"title":"A Blueprint of Log Based Monitoring and Diagnosing Framework in Large Distributed Environments","authors":"Yining Zhao, Xiaodong Wang, Haili Xiao, Xue-bin Chi","doi":"10.22323/1.351.0033","DOIUrl":"https://doi.org/10.22323/1.351.0033","url":null,"abstract":"Distributed systems have kept scaling upward since this concept appears, and they soon evolve to environments that contain heterogeneous components playing different roles, making it difficult to understand how the large environment works or if any undesired matters happened from security point of view. Logs, produced by devices, sub-systems and running processes, are a very important source to help system maintainers to get relative security knowledge. But there are too many logs and too many kinds of logs to deal with, which makes manual checking impossible. In this work we will share some of our experiences in log processing and analyzing. We have summarized some common major steps that appear in most of the existing log analysis approaches, including log selection, log classification, information analyses and result feedback. We also represent a general framework that monitors events, analyzes hidden information and diagnoses the healthy state for large distributed computing environments bases on logs. Although the framework we initially designed was for the maintenance for CNGrid, its process is adaptable to other distributed computing environments.","PeriodicalId":106243,"journal":{"name":"Proceedings of International Symposium on Grids & Clouds 2019 — PoS(ISGC2019)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127133870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}