[研究论文]任意Java库中的源和汇自动检测

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI:10.1109/SCAM.2018.00019

Darius Sas, Marco Bessi, F. Fontana

{"title":"[研究论文]任意Java库中的源和汇自动检测","authors":"Darius Sas, Marco Bessi, F. Fontana","doi":"10.1109/SCAM.2018.00019","DOIUrl":null,"url":null,"abstract":"In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"[Research Paper] Automatic Detection of Sources and Sinks in Arbitrary Java Libraries\",\"authors\":\"Darius Sas, Marco Bessi, F. Fontana\",\"doi\":\"10.1109/SCAM.2018.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.\",\"PeriodicalId\":127335,\"journal\":{\"name\":\"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"129 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2018.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2018.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在过去的十年中，数据安全已经成为全球越来越多的公司关注的主要问题。保护客户的隐私现在是在任何市场中运营的许多企业的核心。因此，对保护用户数据和防止数据泄露的新技术的需求也相应增加。在这项工作中，我们研究了一种基于机器学习的方法来自动从任意Java库中提取源和汇。我们的方法利用嵌入字节码的语义、句法、过程内数据流和类层次特征来区分源和汇。所进行的实验表明，在某些条件下，经过一些预处理，不同库中的源和汇具有共同的特征，这使得机器学习模型能够将它们与其他库方法区分开来。在大约600种方法的验证集上，原型模型取得了86%的准确率和81%的F-measure的显著结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

[Research Paper] Automatic Detection of Sources and Sinks in Arbitrary Java Libraries

In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量

期刊最新文献

[Research Paper] Untangling Composite Commits Using Program Slicing [Engineering Paper] Built-in Clone Detection in Meta Languages [Research Paper] Static JavaScript Call Graphs: A Comparative Study [Engineering Paper] Challenges of Implementing Cross Translation Unit Analysis in Clang Static Analyzer [Engineering Paper] Graal: The Quest for Source Code Knowledge