Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00012
Arthur Malajyan, K. Avetisyan, Tsolak Ghukasyan
In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERT-based models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.
{"title":"ARPA: Armenian Paraphrase Detection Corpus and Models","authors":"Arthur Malajyan, K. Avetisyan, Tsolak Ghukasyan","doi":"10.1109/IVMEM51402.2020.00012","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00012","url":null,"abstract":"In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERT-based models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130092186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00007
H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan
Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.
{"title":"BinSide : Static Analysis Framework for Defects Detection in Binary Code","authors":"H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan","doi":"10.1109/IVMEM51402.2020.00007","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00007","url":null,"abstract":"Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00017
A. Teslyuk, S. Bobkov, Alexander Belyaev, Alexander Filippov, K. Izotov, I. Lyalin, Andrey Shitov, Leonid Yasnopolsky, V. Velikhov
Jupyter notebook is a popular framework for interactive application development and data analysis. Deployment of JupyterHub on a supercomputer infrastructure would allow to combine high computing power and large storage capacity with convenience and ease of use for end users. In this work we present the architecture and deployment details of Jupyter framework in Kurchatov Institute computing infrastructure. In our setup we combined JupyterHub with CEPHfs storage system, FreeIPA user management system, customized CUDA-compatible image with worker applications and used Kubernetes as a component orchestrator.
{"title":"Architecture and deployment details of scalable Jupyter environment at Kurchatov Institute supercomputing centre","authors":"A. Teslyuk, S. Bobkov, Alexander Belyaev, Alexander Filippov, K. Izotov, I. Lyalin, Andrey Shitov, Leonid Yasnopolsky, V. Velikhov","doi":"10.1109/IVMEM51402.2020.00017","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00017","url":null,"abstract":"Jupyter notebook is a popular framework for interactive application development and data analysis. Deployment of JupyterHub on a supercomputer infrastructure would allow to combine high computing power and large storage capacity with convenience and ease of use for end users. In this work we present the architecture and deployment details of Jupyter framework in Kurchatov Institute computing infrastructure. In our setup we combined JupyterHub with CEPHfs storage system, FreeIPA user management system, customized CUDA-compatible image with worker applications and used Kubernetes as a component orchestrator.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124809545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00015
A. Khoroshilov, V. Kuliamin, A. Petrenko, I. Shchepetkov
Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.
{"title":"A State-based Refinement Technique for Event-B","authors":"A. Khoroshilov, V. Kuliamin, A. Petrenko, I. Shchepetkov","doi":"10.1109/IVMEM51402.2020.00015","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00015","url":null,"abstract":"Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00010
M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin
The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used during LHC Run 1 and Run 2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027– 2036) an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allows the correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.
大型强子对撞机实验产生的科学数据量已达到eb级。这些数据在数百个计算中心进行传输、处理和分析。数据在物理学家个人和大学群体中的流行已经成为有效的数据管理和处理的关键因素之一。实验在LHC Run 1和Run 2期间积极使用它进行中央数据处理,并允许优化数据放置策略,并在现有计算资源上更均匀地分配工作负载。除了中央数据处理,大型强子对撞机实验还为成千上万的用户提供物理分析的存储和计算资源。考虑到高亮度运行(2027 - 2036)对撞机升级后数据量和处理时间的显著增加,基于数据访问模式的智能数据放置比LHC开始时更加重要。在本研究中,我们使用ATLAS数据样本对数据流行度进行了详细的探索。此外,我们还分析了处理数据的计算站点的地理位置,以及进行物理分析的用户所在机构的地理位置。基于这些数据的制图可视化允许将现有数据放置与物理需求相关联,从而更好地了解不同类别用户任务的数据利用情况。
{"title":"High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study","authors":"M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin","doi":"10.1109/IVMEM51402.2020.00010","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00010","url":null,"abstract":"The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used during LHC Run 1 and Run 2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027– 2036) an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allows the correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114823419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00011
A. Kozachok, S. Kopylov
The article describes an analytical model of the maximum achievable embedding capacity evaluation for robust watermark based on the approach to information embedding in text data by line space shifting. The developed model allows to boundary values assessment of information amount that may contain a watermark embedded into text data printed. In the developing process of an analytical model, the dependence of maximum achievable embedding capacity on the lines amount of a text document and the used watermark embedding parameters was established. The relationship between the parameters of a text document and the lines number per page of a text document is mathematically described. Mathematical calculations of the obtained expressions and the corresponding experimental researches are conducted. The evaluation of obtained simulation results correspondence to the parameters of texts printed on paper is implemented. The simulation results are analyzed and a linear dependence of the results is established. The obtained values are approximated and analytical expressions that allow one to quantify the maximum achievable embedding capacity of the developed robust watermark depending on the embedding parameters used are received. The degree of contradictions between the following parameters of robust watermarks: embedding capacity, extractability and robustness is estimated. The relationship between the maximum achievable embedding capacity and the accuracy of the extraction of the developed watermark is determined. Quantitative estimates of the influence of the size of the watermark on the final extraction accuracy of embedded information are given. The further research directions are determined.
{"title":"Estimation of Watermark Embedding Capacity with Line Space Shifting","authors":"A. Kozachok, S. Kopylov","doi":"10.1109/IVMEM51402.2020.00011","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00011","url":null,"abstract":"The article describes an analytical model of the maximum achievable embedding capacity evaluation for robust watermark based on the approach to information embedding in text data by line space shifting. The developed model allows to boundary values assessment of information amount that may contain a watermark embedded into text data printed. In the developing process of an analytical model, the dependence of maximum achievable embedding capacity on the lines amount of a text document and the used watermark embedding parameters was established. The relationship between the parameters of a text document and the lines number per page of a text document is mathematically described. Mathematical calculations of the obtained expressions and the corresponding experimental researches are conducted. The evaluation of obtained simulation results correspondence to the parameters of texts printed on paper is implemented. The simulation results are analyzed and a linear dependence of the results is established. The obtained values are approximated and analytical expressions that allow one to quantify the maximum achievable embedding capacity of the developed robust watermark depending on the embedding parameters used are received. The degree of contradictions between the following parameters of robust watermarks: embedding capacity, extractability and robustness is estimated. The relationship between the maximum achievable embedding capacity and the accuracy of the extraction of the developed watermark is determined. Quantitative estimates of the influence of the size of the watermark on the final extraction accuracy of embedded information are given. The further research directions are determined.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133987245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00013
V. Ryzhkova
The article covers the modern trends of compiling printed and electronic field-specific dictionaries of technical terms. It discloses both theoretical and practical aspects of compiling such dictionaries.
本文涵盖了编制印刷和电子领域专用术语词典的现代趋势。它揭示了词典编纂的理论和实践两个方面。
{"title":"Possibilities of Computer Lexicography in Compiling Highly Specialized Terminological Printed and Electronic Dictionaries (Field of Aviation Engineering)","authors":"V. Ryzhkova","doi":"10.1109/IVMEM51402.2020.00013","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00013","url":null,"abstract":"The article covers the modern trends of compiling printed and electronic field-specific dictionaries of technical terms. It discloses both theoretical and practical aspects of compiling such dictionaries.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116800654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00016
A. Kozachok, A. Spirin, Alexander I. Kozachok, Alexey N. Tsibulia
Due to the increased number of information leaks caused by internal violators and the lack of mechanisms in modern DLP systems to counter information leaks in encrypted or compressed form, was proposed a method for classifying sequences formed by encryption and data compression algorithms. An algorithm for constructing a random forest was proposed, and the choice of classifier hyper parameters was justified. The presented approach showed the accuracy of classification of the sequences specified in the work 0.98.
{"title":"Classification of pseudo-random sequences based on the random forest algorithm","authors":"A. Kozachok, A. Spirin, Alexander I. Kozachok, Alexey N. Tsibulia","doi":"10.1109/IVMEM51402.2020.00016","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00016","url":null,"abstract":"Due to the increased number of information leaks caused by internal violators and the lack of mechanisms in modern DLP systems to counter information leaks in encrypted or compressed form, was proposed a method for classifying sequences formed by encryption and data compression algorithms. An algorithm for constructing a random forest was proposed, and the choice of classifier hyper parameters was justified. The presented approach showed the accuracy of classification of the sequences specified in the work 0.98.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127685838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-01DOI: 10.1109/IVMEM51402.2020.00020
S. Zasukhin, E. Zasukhina
The problem of determining soil parameters is considered. Their exact knowledge is of great importance for planning and managing water systems, assessing the possible size of catastrophic floods, etc. These parameters are proposed to be found by solving some optimal control problem, where the controlled process is described by the Richards equation. The objective function is mean-square deviation of the observed soil moisture values from its simulated values, which are obtained from the solution of the Richards equation with the selected parameters values. Numerical optimization is performed using Newton method. Derivatives of the objective function are calculated using fast automatic differentiation techniques.
{"title":"Determining Soil Parameters","authors":"S. Zasukhin, E. Zasukhina","doi":"10.1109/IVMEM51402.2020.00020","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00020","url":null,"abstract":"The problem of determining soil parameters is considered. Their exact knowledge is of great importance for planning and managing water systems, assessing the possible size of catastrophic floods, etc. These parameters are proposed to be found by solving some optimal control problem, where the controlled process is described by the Richards equation. The objective function is mean-square deviation of the observed soil moisture values from its simulated values, which are obtained from the solution of the Richards equation with the selected parameters values. Numerical optimization is performed using Newton method. Derivatives of the objective function are calculated using fast automatic differentiation techniques.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"123 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113944920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}