Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis最新文献

英文中文

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis 第一届机器学习与共生软件工程国际研讨会论文集

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127

引用次数: 2

Applying graph kernels to model-driven engineering problems 图核在模型驱动工程问题中的应用

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127.3243128

R. Clarisó, Jordi Cabot

Machine Learning (ML) can be used to analyze and classify large collections of graph-based information, e.g. images, location information, the structure of molecules and proteins, ... Graph kernels is one of the ML techniques typically used for such tasks. In a software engineering context, models of a system such as structural or architectural diagrams can be viewed as labeled graphs. Thus, in this paper we propose to employ graph kernels for clustering software modeling artifacts. Among other benefits, this would improve the efficiency and usability of a variety of software modeling activities, e.g., design space exploration, testing or verification and validation.

机器学习(ML)可用于分析和分类基于图形的大量信息，例如图像、位置信息、分子和蛋白质的结构等。图核是通常用于此类任务的ML技术之一。在软件工程上下文中，系统模型(如结构图或架构图)可以被视为标记的图。因此，在本文中，我们建议使用图核来聚类软件建模工件。在其他好处中，这将提高各种软件建模活动的效率和可用性，例如，设计空间探索、测试或验证和确认。

引用次数: 20

Fast deployment and scoring of support vector machine models in CPU and GPU 支持向量机模型在CPU和GPU上的快速部署和评分

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127.3243133

Oscar Castro-López, Inés Fernando Vega López

In this paper, we present an approach for the fast deployment and efficient scoring of Support Vector Machine (SVM) models. We developed a compiler for transforming a formal specification of a SVM and generating source code in different versions of the C/C++ language. This effectively automates the deployment of SVM models and its integration into the operational software for its use. The proposed compiler generates efficient code to deploy SVM models in CPUs (single or multi-core) and in Graphics Processing Units (GPUs) through NVIDIA's Computed Unified Device Architecture (CUDA). We also present an empirical evaluation of our compiler's targets scoring a SVM model with a linear kernel. In our experiments we score a real dataset in batch mode at different scales. The results show that our C CUDA implementation performs better as data scale increases and it is approximately 38 times faster than the single-core implementation using single precision floating-point values.

本文提出了一种支持向量机(SVM)模型快速部署和高效评分的方法。我们开发了一个编译器，用于转换支持向量机的正式规范并生成不同版本的C/ c++语言的源代码。这有效地自动化了SVM模型的部署，并将其集成到操作软件中以供其使用。该编译器通过NVIDIA的计算统一设备架构(CUDA)生成高效的代码，在cpu(单核或多核)和图形处理单元(gpu)中部署SVM模型。我们还提出了一个经验评估我们的编译器的目标评分SVM模型与线性核。在我们的实验中，我们以批处理模式在不同的尺度上对真实数据集进行评分。结果表明，随着数据规模的增加，我们的C CUDA实现的性能更好，它比使用单精度浮点值的单核实现快大约38倍。

引用次数: 1

A deep learning approach to program similarity 程序相似度的深度学习方法

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127.3243131

Niccolò Marastoni, R. Giacobazzi, M. Preda

In this work we tackle the problem of binary code similarity by using deep learning applied to binary code visualization techniques. Our idea is to represent binaries as images and then to investigate whether it is possible to recognize similar binaries by applying deep learning algorithms for image classification. In particular, we apply the proposed deep learning framework to a dataset of binary code variants obtained through code obfuscation. These binary variants exhibit similar behaviours while being syntactically different. Our results show that the problem of binary code recognition is strictly separated from simple image recognition problems. Moreover, the analysis of the results of the experiments conducted in this work lead us to the identification of interesting research challenges. For example, in order to use image recognition approaches to recognize similar binary code samples it is important to further investigate how to build a suitable mapping from executables to images.

在这项工作中，我们通过将深度学习应用于二进制代码可视化技术来解决二进制代码相似度的问题。我们的想法是将二进制文件表示为图像，然后研究是否有可能通过应用深度学习算法进行图像分类来识别相似的二进制文件。特别地，我们将提出的深度学习框架应用于通过代码混淆获得的二进制代码变体数据集。这些二进制变体表现出相似的行为，但语法不同。我们的研究结果表明，二进制码识别问题与简单的图像识别问题是严格分离的。此外，对这项工作中进行的实验结果的分析使我们确定了有趣的研究挑战。例如，为了使用图像识别方法来识别类似的二进制代码样本，进一步研究如何构建从可执行文件到图像的合适映射是很重要的。

引用次数: 21

A language-agnostic model for semantic source code labeling 语义源代码标注的语言无关模型

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127.3243132

Ben U. Gelman, B. Hoyle, Jessica Moore, Joshua Saxe, David Slater

Code search and comprehension have become more difficult in recent years due to the rapid expansion of available source code. Current tools lack a way to label arbitrary code at scale while maintaining up-to-date representations of new programming languages, libraries, and functionalities. Comprehensive labeling of source code enables users to search for documents of interest and obtain a high-level understanding of their contents. We use Stack Overflow code snippets and their tags to train a language-agnostic, deep convolutional neural network to automatically predict semantic labels for source code documents. On Stack Overflow code snippets, we demonstrate a mean area under ROC of 0.957 over a long-tailed list of 4,508 tags. We also manually validate the model outputs on a diverse set of unlabeled source code documents retrieved from Github, and obtain a top-1 accuracy of 86.6%. This strongly indicates that the model successfully transfers its knowledge from Stack Overflow snippets to arbitrary source code documents.

近年来，由于可用源代码的快速扩展，代码搜索和理解变得更加困难。当前的工具缺乏一种方法来大规模地标记任意代码，同时维护新编程语言、库和功能的最新表示。源代码的全面标记使用户能够搜索感兴趣的文档并获得对其内容的高级理解。我们使用Stack Overflow代码片段和它们的标签来训练一个与语言无关的深度卷积神经网络，以自动预测源代码文档的语义标签。在Stack Overflow代码片段中，我们展示了一个包含4,508个标签的长尾列表在0.957下的平均面积。我们还在从Github检索的一组不同的未标记源代码文档上手动验证模型输出，并获得86.6%的前1准确率。这强烈表明模型成功地将其知识从堆栈溢出片段转移到任意源代码文档。

引用次数: 7

Learning-based testing for autonomous systems using spatial and temporal requirements 基于学习的基于空间和时间需求的自主系统测试

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-09-03 DOI: 10.1145/3243127.3243129

Hojat Khosrowjerdi, K. Meinke

Cooperating cyber-physical systems-of-systems (CO-CPS) such as vehicle platoons, robot teams or drone swarms usually have strict safety requirements on both spatial and temporal behavior. Learning-based testing is a combination of machine learning and model checking that has been successfully used for black-box requirements testing of cyber-physical systems-of-systems. We present an overview of research in progress to apply learning-based testing to evaluate spatio-temporal requirements on autonomous systems-of-systems through modeling and simulation.

协作的网络物理系统(CO-CPS)，如车辆排、机器人团队或无人机群，通常对空间和时间行为都有严格的安全要求。基于学习的测试是机器学习和模型检查的结合，已经成功地用于网络物理系统的黑盒需求测试。我们概述了通过建模和仿真，应用基于学习的测试来评估自主系统的时空需求的研究进展。

引用次数: 24

Automatically assessing vulnerabilities discovered by compositional analysis 自动评估由组合分析发现的漏洞

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

Pub Date : 2018-07-24 DOI: 10.1145/3243127.3243130

Saahil Ognawala, R. Amato, A. Pretschner, Pooja Kulkarni

Testing is the most widely employed method to find vulnerabilities in real-world software programs. Compositional analysis, based on symbolic execution, is an automated testing method to find vulnerabilities in medium- to large-scale programs consisting of many interacting components. However, existing compositional analysis frameworks do not assess the severity of reported vulnerabilities. In this paper, we present a framework to analyze vulnerabilities discovered by an existing compositional analysis tool and assign CVSS3 (Common Vulnerability Scoring System v3.0) scores to them, based on various heuristics such as interaction with related components, ease of reachability, complexity of design and likelihood of accepting unsanitized input. By analyzing vulnerabilities reported with CVSS3 scores in the past, we train simple machine learning models. By presenting our interactive framework to developers of popular open-source software and other security experts, we gather feedback on our trained models and further improve the features to increase the accuracy of our predictions. By providing qualitative (based on community feedback) and quantitative (based on prediction accuracy) evidence from 21 open-source programs, we show that our severity prediction framework can effectively assist developers with assessing vulnerabilities.

测试是在实际软件程序中发现漏洞的最广泛使用的方法。组合分析是一种基于符号执行的自动化测试方法，用于在由许多相互作用的组件组成的中大型程序中发现漏洞。然而，现有的组合分析框架并没有评估所报告的漏洞的严重性。在本文中，我们提出了一个框架来分析现有组合分析工具发现的漏洞，并根据与相关组件的交互、可达性、设计复杂性和接受未经处理的输入的可能性等各种启发式方法，为它们分配CVSS3 (Common Vulnerability Scoring System v3.0)分数。通过分析过去使用CVSS3分数报告的漏洞，我们训练了简单的机器学习模型。通过向流行的开源软件开发人员和其他安全专家展示我们的交互式框架，我们收集了关于我们训练模型的反馈，并进一步改进了功能，以提高我们预测的准确性。通过提供来自21个开源程序的定性(基于社区反馈)和定量(基于预测准确性)证据，我们表明我们的严重性预测框架可以有效地帮助开发人员评估漏洞。

{"title":"Automatically assessing vulnerabilities discovered by compositional analysis","authors":"Saahil Ognawala, R. Amato, A. Pretschner, Pooja Kulkarni","doi":"10.1145/3243127.3243130","DOIUrl":"https://doi.org/10.1145/3243127.3243130","url":null,"abstract":"Testing is the most widely employed method to find vulnerabilities in real-world software programs. Compositional analysis, based on symbolic execution, is an automated testing method to find vulnerabilities in medium- to large-scale programs consisting of many interacting components. However, existing compositional analysis frameworks do not assess the severity of reported vulnerabilities. In this paper, we present a framework to analyze vulnerabilities discovered by an existing compositional analysis tool and assign CVSS3 (Common Vulnerability Scoring System v3.0) scores to them, based on various heuristics such as interaction with related components, ease of reachability, complexity of design and likelihood of accepting unsanitized input. By analyzing vulnerabilities reported with CVSS3 scores in the past, we train simple machine learning models. By presenting our interactive framework to developers of popular open-source software and other security experts, we gather feedback on our trained models and further improve the features to increase the accuracy of our predictions. By providing qualitative (based on community feedback) and quantitative (based on prediction accuracy) evidence from 21 open-source programs, we show that our severity prediction framework can effectively assist developers with assessing vulnerabilities.","PeriodicalId":244058,"journal":{"name":"Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134023329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀