DeepWeak:通过知识图嵌入推理常见的软件弱点

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330232

Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, Zhiyong Feng

{"title":"DeepWeak:通过知识图嵌入推理常见的软件弱点","authors":"Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, Zhiyong Feng","doi":"10.1109/SANER.2018.8330232","DOIUrl":null,"url":null,"abstract":"Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"8 1","pages":"456-466"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"DeepWeak: Reasoning common software weaknesses via knowledge graph embedding\",\"authors\":\"Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, Zhiyong Feng\",\"doi\":\"10.1109/SANER.2018.8330232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.\",\"PeriodicalId\":6602,\"journal\":{\"name\":\"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"volume\":\"8 1\",\"pages\":\"456-466\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SANER.2018.8330232\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

常见的软件弱点，如不正确的输入验证、整数溢出，会直接或间接地损害系统安全性，导致诸如拒绝服务、执行未经授权的代码等不利影响。通用弱点枚举(Common Weakness Enumeration, CWE)维护一个通用软件弱点的标准列表和分类。尽管CWE包含了关于软件弱点的丰富信息，包括文本描述、常见序列和软件弱点之间的关系，但当前的数据表示，即超链接文档，不支持对软件弱点的高级推理任务，例如预测缺失关系和CWE的常见后果。这样的推理任务对于管理和分析大量常见软件弱点及其关系变得至关重要。本文提出将常见的软件弱点及其关系表示为知识图，并开发了一种基于翻译的、描述体现的知识表示学习方法，将知识图中的软件弱点及其关系嵌入到语义向量空间中。软件弱点及其关系的向量表示(即嵌入)可以用于知识获取和推理。我们进行了大量的实验来评估软件弱点和关系嵌入在三个推理任务中的性能，包括CWE链接预测、CWE三重分类和常见结果预测。我们的知识图嵌入方法优于其他基于描述和/或结构的表示学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DeepWeak: Reasoning common software weaknesses via knowledge graph embedding

Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量

期刊最新文献

Exploring the integration of user feedback in automated testing of Android applications The Statechart Workbench: Enabling scalable software event log analysis using process mining Detecting code smells using machine learning techniques: Are we there yet? Classifying stack overflow posts on API issues Re-evaluating method-level bug prediction