2020 Ivannikov Memorial Workshop (IVMEM)最新文献

英文中文

ARPA: Armenian Paraphrase Detection Corpus and Models 亚美尼亚语释义检测语料库和模型

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00012

Arthur Malajyan, K. Avetisyan, Tsolak Ghukasyan

In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERT-based models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.

在这项工作中，我们采用了一种基于反翻译的半自动方法来生成亚美尼亚语的句子释义语料库。最初的句子集从亚美尼亚语翻译成英语，然后再翻译两次，结果是成对的词汇遥远但语义相似的句子。生成的释义然后被手工审查和注释。使用该方法创建了训练和测试数据集，总共包含2360个释义。此外，这些数据集用于训练和评估基于bert的模型，以检测亚美尼亚语的释义，达到与其他语言的最新水平相当的结果。

引用次数: 8

BinSide : Static Analysis Framework for Defects Detection in Binary Code 二进制代码缺陷检测的静态分析框架

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00007

H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan

Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.

软件开发人员犯的错误可能导致软件产品的失败。检测缺陷的一种方法是静态分析:在不执行的情况下检查代码。目前，各种源代码静态分析工具被广泛用于检测缺陷。然而，源代码分析是不够的。其原因是使用第三方二进制库，无法证明所有编译器优化的正确性。本文介绍了BinSide:用于缺陷检测的二进制静态分析框架。它进行程序间、上下文敏感和流程敏感的分析。该框架使用独立于平台的中间表示，并提供了分析各种体系结构二进制文件的机会。该框架包括值分析、到达定义、污点分析、释放内存分析、常量折叠和常量传播引擎。它提供API(应用程序编程接口)，可用于开发新的分析器。此外，我们使用API开发了经典缓冲区溢出、格式字符串、命令注入、双重免费和免费后使用缺陷检测的检查器。

{"title":"BinSide : Static Analysis Framework for Defects Detection in Binary Code","authors":"H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan","doi":"10.1109/IVMEM51402.2020.00007","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00007","url":null,"abstract":"Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Architecture and deployment details of scalable Jupyter environment at Kurchatov Institute supercomputing centre 库尔恰托夫研究所超级计算中心可扩展Jupyter环境的体系结构和部署细节

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00017

A. Teslyuk, S. Bobkov, Alexander Belyaev, Alexander Filippov, K. Izotov, I. Lyalin, Andrey Shitov, Leonid Yasnopolsky, V. Velikhov

Jupyter notebook is a popular framework for interactive application development and data analysis. Deployment of JupyterHub on a supercomputer infrastructure would allow to combine high computing power and large storage capacity with convenience and ease of use for end users. In this work we present the architecture and deployment details of Jupyter framework in Kurchatov Institute computing infrastructure. In our setup we combined JupyterHub with CEPHfs storage system, FreeIPA user management system, customized CUDA-compatible image with worker applications and used Kubernetes as a component orchestrator.

Jupyter notebook是用于交互式应用程序开发和数据分析的流行框架。在超级计算机基础设施上部署JupyterHub将允许将高计算能力和大存储容量与最终用户的便利性和易用性相结合。在这项工作中，我们介绍了Jupyter框架在库尔恰托夫研究所计算基础设施中的架构和部署细节。在我们的设置中，我们将JupyterHub与cepphfs存储系统、FreeIPA用户管理系统、定制的cuda兼容映像与worker应用程序结合在一起，并使用Kubernetes作为组件编排器。

引用次数: 0

A State-based Refinement Technique for Event-B 事件b的基于状态的细化技术

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00015

A. Khoroshilov, V. Kuliamin, A. Petrenko, I. Shchepetkov

Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.

形式化模型可用于描述和推理给定系统的行为和属性。在某些情况下，甚至可以证明系统满足给定的性质。这允许及早发现设计错误和不一致，并在开始开发之前修复它们。这样的模型通常使用逐步细化来创建:从系统的简单抽象模型开始，然后逐步细化它，在每个后续的细化级别添加更多的细节。模型的顶层通常描述系统的高层设计或目的，而低层则更直接地与实现代码进行比较。在本文中，我们提出了Event-B的一种新的替代改进技术，该技术可以简化高级设计和实现之间存在较大差距的复杂模型的开发。

引用次数: 0

High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study 高能物理数据普及:ATLAS数据集普及案例研究

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00010

M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin

The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used during LHC Run 1 and Run 2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027– 2036) an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allows the correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.

大型强子对撞机实验产生的科学数据量已达到eb级。这些数据在数百个计算中心进行传输、处理和分析。数据在物理学家个人和大学群体中的流行已经成为有效的数据管理和处理的关键因素之一。实验在LHC Run 1和Run 2期间积极使用它进行中央数据处理，并允许优化数据放置策略，并在现有计算资源上更均匀地分配工作负载。除了中央数据处理，大型强子对撞机实验还为成千上万的用户提供物理分析的存储和计算资源。考虑到高亮度运行(2027 - 2036)对撞机升级后数据量和处理时间的显著增加，基于数据访问模式的智能数据放置比LHC开始时更加重要。在本研究中，我们使用ATLAS数据样本对数据流行度进行了详细的探索。此外，我们还分析了处理数据的计算站点的地理位置，以及进行物理分析的用户所在机构的地理位置。基于这些数据的制图可视化允许将现有数据放置与物理需求相关联，从而更好地了解不同类别用户任务的数据利用情况。

{"title":"High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study","authors":"M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin","doi":"10.1109/IVMEM51402.2020.00010","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00010","url":null,"abstract":"The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used during LHC Run 1 and Run 2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027– 2036) an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allows the correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114823419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Estimation of Watermark Embedding Capacity with Line Space Shifting 基于行空间移位的水印嵌入容量估计

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00011

A. Kozachok, S. Kopylov

The article describes an analytical model of the maximum achievable embedding capacity evaluation for robust watermark based on the approach to information embedding in text data by line space shifting. The developed model allows to boundary values assessment of information amount that may contain a watermark embedded into text data printed. In the developing process of an analytical model, the dependence of maximum achievable embedding capacity on the lines amount of a text document and the used watermark embedding parameters was established. The relationship between the parameters of a text document and the lines number per page of a text document is mathematically described. Mathematical calculations of the obtained expressions and the corresponding experimental researches are conducted. The evaluation of obtained simulation results correspondence to the parameters of texts printed on paper is implemented. The simulation results are analyzed and a linear dependence of the results is established. The obtained values are approximated and analytical expressions that allow one to quantify the maximum achievable embedding capacity of the developed robust watermark depending on the embedding parameters used are received. The degree of contradictions between the following parameters of robust watermarks: embedding capacity, extractability and robustness is estimated. The relationship between the maximum achievable embedding capacity and the accuracy of the extraction of the developed watermark is determined. Quantitative estimates of the influence of the size of the watermark on the final extraction accuracy of embedded information are given. The further research directions are determined.

基于文本数据行距移位信息嵌入方法，提出了鲁棒水印最大可实现嵌入容量评估的分析模型。所开发的模型允许对可能包含嵌入到文本数据中的水印的信息量进行边界值评估。在分析模型的建立过程中，建立了可实现的最大嵌入容量与文本文档的行数和所使用的水印嵌入参数的依赖关系。文本文档的参数与文本文档每页的行数之间的关系用数学方法描述。对所得表达式进行了数学计算并进行了相应的实验研究。对得到的仿真结果与打印在纸上的文本参数的对应关系进行了评价。对仿真结果进行了分析，建立了仿真结果的线性关系。得到的值是近似的，并得到了允许人们根据所使用的嵌入参数量化所开发的鲁棒水印的最大可实现嵌入容量的解析表达式。估计了鲁棒水印的嵌入能力、可提取性和鲁棒性等参数之间的矛盾程度。确定了可实现的最大嵌入容量与生成水印的提取精度之间的关系。定量估计了水印大小对嵌入信息最终提取精度的影响。确定了进一步的研究方向。

{"title":"Estimation of Watermark Embedding Capacity with Line Space Shifting","authors":"A. Kozachok, S. Kopylov","doi":"10.1109/IVMEM51402.2020.00011","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00011","url":null,"abstract":"The article describes an analytical model of the maximum achievable embedding capacity evaluation for robust watermark based on the approach to information embedding in text data by line space shifting. The developed model allows to boundary values assessment of information amount that may contain a watermark embedded into text data printed. In the developing process of an analytical model, the dependence of maximum achievable embedding capacity on the lines amount of a text document and the used watermark embedding parameters was established. The relationship between the parameters of a text document and the lines number per page of a text document is mathematically described. Mathematical calculations of the obtained expressions and the corresponding experimental researches are conducted. The evaluation of obtained simulation results correspondence to the parameters of texts printed on paper is implemented. The simulation results are analyzed and a linear dependence of the results is established. The obtained values are approximated and analytical expressions that allow one to quantify the maximum achievable embedding capacity of the developed robust watermark depending on the embedding parameters used are received. The degree of contradictions between the following parameters of robust watermarks: embedding capacity, extractability and robustness is estimated. The relationship between the maximum achievable embedding capacity and the accuracy of the extraction of the developed watermark is determined. Quantitative estimates of the influence of the size of the watermark on the final extraction accuracy of embedded information are given. The further research directions are determined.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133987245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Possibilities of Computer Lexicography in Compiling Highly Specialized Terminological Printed and Electronic Dictionaries (Field of Aviation Engineering) 计算机词典编纂高度专业化术语印刷和电子词典的可能性(航空工程领域)

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00013

V. Ryzhkova

The article covers the modern trends of compiling printed and electronic field-specific dictionaries of technical terms. It discloses both theoretical and practical aspects of compiling such dictionaries.

本文涵盖了编制印刷和电子领域专用术语词典的现代趋势。它揭示了词典编纂的理论和实践两个方面。

引用次数: 3

Classification of pseudo-random sequences based on the random forest algorithm 基于随机森林算法的伪随机序列分类

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00016

A. Kozachok, A. Spirin, Alexander I. Kozachok, Alexey N. Tsibulia

Due to the increased number of information leaks caused by internal violators and the lack of mechanisms in modern DLP systems to counter information leaks in encrypted or compressed form, was proposed a method for classifying sequences formed by encryption and data compression algorithms. An algorithm for constructing a random forest was proposed, and the choice of classifier hyper parameters was justified. The presented approach showed the accuracy of classification of the sequences specified in the work 0.98.

针对内部违规者导致的信息泄露事件增多，以及现代DLP系统中缺乏对抗加密或压缩形式信息泄露的机制，提出了一种对加密和数据压缩算法形成的序列进行分类的方法。提出了一种构造随机森林的算法，并对分类器超参数的选择进行了论证。该方法对工作中指定序列的分类准确率为0.98。

引用次数: 1

Title Page I 第一页

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/ivmem51402.2020.00001

引用次数: 0

Determining Soil Parameters 测定土壤参数

2020 Ivannikov Memorial Workshop (IVMEM)

Pub Date : 2020-09-01 DOI: 10.1109/IVMEM51402.2020.00020

S. Zasukhin, E. Zasukhina

The problem of determining soil parameters is considered. Their exact knowledge is of great importance for planning and managing water systems, assessing the possible size of catastrophic floods, etc. These parameters are proposed to be found by solving some optimal control problem, where the controlled process is described by the Richards equation. The objective function is mean-square deviation of the observed soil moisture values from its simulated values, which are obtained from the solution of the Richards equation with the selected parameters values. Numerical optimization is performed using Newton method. Derivatives of the objective function are calculated using fast automatic differentiation techniques.

考虑了土壤参数的确定问题。他们的准确知识对于规划和管理水系统，评估灾难性洪水的可能规模等非常重要。这些参数是通过求解一些最优控制问题来确定的，其中被控过程用Richards方程来描述。目标函数是土壤湿度观测值与模拟值的均方差，模拟值由Richards方程的解与所选参数值求得。采用牛顿法进行数值优化。使用快速自动微分技术计算目标函数的导数。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 Ivannikov Memorial Workshop (IVMEM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀