Proceedings of the 5th International Workshop on Software Mining最新文献

英文中文

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2990476

A. Zeller

When interacting with mobile apps, do users always get what they expect? We have mined thousands of Android apps for common features such as descriptions, APIs used, data flows, and (recently) user interfaces and callbacks. Associating these with each other allows us to detect outliers: Apps whose description does not fit their behavior; apps whose sensitive data flow is usual; and user interface elements whose text or icon suggests one action, but which actually are tied to other actions. Such anomalies not only reveal bugs, but actual security issues – and there is a huge treasure trove worth of data to be mined, abstracted, and analyzed.

当用户与移动应用程序交互时，他们是否总能得到他们所期望的?我们挖掘了数千个Android应用程序的常见功能，如描述、使用的api、数据流，以及(最近的)用户界面和回调。将它们相互关联可以让我们检测到异常值:描述不符合其行为的应用程序;敏感数据流正常的应用程序;以及用户界面元素，其文本或图标提示一个操作，但实际上与其他操作相关联。这样的异常不仅暴露了错误，而且还暴露了实际的安全问题——并且有一个巨大的数据宝库需要挖掘、抽象和分析。

引用次数: 1

Automatic prediction of bug fixing effort measured by code churn size 通过代码变动大小来自动预测bug修复工作

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975964

Ferdian Thung

During software maintenance, developers often receive many bug reports. Project managers often need to manage limited resources to resolve the many bugs that a project receives. To help project managers perform their job, past studies have proposed techniques that predict the amount of time that passes between a bug report being submitted and it being resolved. However, this time period might not be representative of the actual development effort, as developers might not work on the bug right away or all the time. In the open source development setting, developers are only volunteers and might not devote their full working hours to fix a bug in a particular open source project. In the industrial setting, developers might be asked to perform various tasks aside from fixing a particular bug. In this work, we estimate bug fixing effort in terms of code churn size. Code churn size is the number of lines of code that is either added, deleted, or modified to fix the bug. Lines of code has traditionally been used to estimate effort. However, no past studies have proposed techniques to automatically predict code churn size. In this work, using code churn size as estimation for bug fixing effort, we propose a classification-based approach that predicts, given a bug report, whether the bug fixing effort would be high or low. We have evaluated our approach on 1,029 bug reports from hadoop-common and struts2. The result is promising; we can achieve an Area Under the Receiver Operating Curve (AUC) of 0.612 to predict bug fixing effort in terms of lines of code churned, which is a 22.4% improvement over a baseline.

在软件维护期间，开发人员经常收到许多错误报告。项目经理经常需要管理有限的资源来解决项目收到的许多错误。为了帮助项目经理完成他们的工作，过去的研究已经提出了预测从提交错误报告到解决错误报告之间所经过的时间的技术。然而，这段时间可能并不代表实际的开发工作，因为开发人员可能不会立即或一直处理bug。在开放源码开发环境中，开发人员只是志愿者，可能不会投入全部工作时间来修复特定开放源码项目中的错误。在工业环境中，除了修复特定的错误之外，开发人员可能会被要求执行各种任务。在这项工作中，我们根据代码变动的大小来估计bug修复的工作量。代码变动大小是为了修复错误而添加、删除或修改的代码行数。传统上使用代码行数来评估工作量。然而，过去没有研究提出自动预测代码流失大小的技术。在这项工作中，使用代码变动大小作为错误修复工作的估计，我们提出了一种基于分类的方法，根据错误报告，预测错误修复工作是高还是低。我们已经在hadoop-common和struts2的1029个bug报告中评估了我们的方法。结果是有希望的;我们可以实现接收器操作曲线下面积(AUC)为0.612，以代码行数来预测bug修复工作，这比基线提高了22.4%。

{"title":"Automatic prediction of bug fixing effort measured by code churn size","authors":"Ferdian Thung","doi":"10.1145/2975961.2975964","DOIUrl":"https://doi.org/10.1145/2975961.2975964","url":null,"abstract":"During software maintenance, developers often receive many bug reports. Project managers often need to manage limited resources to resolve the many bugs that a project receives. To help project managers perform their job, past studies have proposed techniques that predict the amount of time that passes between a bug report being submitted and it being resolved. However, this time period might not be representative of the actual development effort, as developers might not work on the bug right away or all the time. In the open source development setting, developers are only volunteers and might not devote their full working hours to fix a bug in a particular open source project. In the industrial setting, developers might be asked to perform various tasks aside from fixing a particular bug. In this work, we estimate bug fixing effort in terms of code churn size. Code churn size is the number of lines of code that is either added, deleted, or modified to fix the bug. Lines of code has traditionally been used to estimate effort. However, no past studies have proposed techniques to automatically predict code churn size. In this work, using code churn size as estimation for bug fixing effort, we propose a classification-based approach that predicts, given a bug report, whether the bug fixing effort would be high or low. We have evaluated our approach on 1,029 bug reports from hadoop-common and struts2. The result is promising; we can achieve an Area Under the Receiver Operating Curve (AUC) of 0.612 to predict bug fixing effort in terms of lines of code churned, which is a 22.4% improvement over a baseline.","PeriodicalId":106703,"journal":{"name":"Proceedings of the 5th International Workshop on Software Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115279714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Duplicate issue detection for the Android open source project Android开源项目的重复问题检测

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975965

Kasthuri Jayarajah, Meera Radhakrishnan, Camellia Zakaria

The Android Open Source Project(AOSP) has seen tremendous traction over the past decade, and as such, the bug repository is growing in scale. With this growth, the effort required for project members to triage incoming new reports to identify whether it is a duplicate issue that has already been addressed, or receiving attention, is also on the rise. In this work, we create dataset of issues from the Android issue tracker, and use standard IR techniques such as VSM and LDA to understand their capability in such similar issue retrieval. Further, we combine VSM and LDA to evaluate its usefulness. We find that, overall, VSM performs better with this dataset.

Android开源项目(AOSP)在过去十年中获得了巨大的关注，因此，bug存储库的规模也在不断扩大。随着这种增长，项目成员需要对传入的新报告进行分类，以确定它是已经解决的重复问题，还是正在受到关注，这些工作也在增加。在这项工作中，我们从Android问题跟踪器中创建问题数据集，并使用标准IR技术(如VSM和LDA)来了解它们在此类类似问题检索中的能力。此外，我们结合VSM和LDA来评估其有用性。我们发现，总体而言，VSM在这个数据集上表现更好。

引用次数: 1

Mining testing questions on stack overflow 在堆栈溢出上挖掘测试问题

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975966

Pavneet Singh Kochhar

During software maintenance, testing is a crucial activity to ensure the quality of code as it evolves over time. With the increasing size and complexity of software, adequate software testing has become increasingly important. Developers often ask problems they face during testing on Community Question Answering (CQA) websites such as Stack Overflow. These websites can serve as good repositories to understand the common topics of discussions and challenges faced by developers during testing. In this paper, we present a study of common challenges and important topics of discussion, by mining testing related questions asked on Stack Overflow. We use unsupervised learning to categorize the questions and rank all the Stack Overflow questions based on their importance. Our results show that topics such as test framework, database and client server are more often discussed compared to other topics. Also, there has been an uptrend for mobile development questions in testing related discussions.

在软件维护期间，测试是确保随着时间的推移代码质量的关键活动。随着软件规模和复杂性的增加，充分的软件测试变得越来越重要。开发人员经常在Stack Overflow等社区问答(CQA)网站上询问他们在测试过程中遇到的问题。这些网站可以作为很好的存储库来理解讨论的常见主题和开发人员在测试期间面临的挑战。在本文中，我们通过挖掘有关堆栈溢出的测试相关问题，对常见的挑战和重要的讨论主题进行了研究。我们使用无监督学习对问题进行分类，并根据它们的重要性对所有Stack Overflow问题进行排名。我们的研究结果表明，与其他主题相比，测试框架、数据库和客户端服务器等主题更常被讨论。此外，在与测试相关的讨论中，移动开发问题也出现了上升趋势。

引用次数: 16

On the feasibility of detecting cross-platform code clones via identifier similarity 基于识别码相似度检测跨平台代码克隆的可行性研究

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975967

Xiao Cheng, Lingxiao Jiang, Hao Zhong, Haibo Yu, Jianjun Zhao

More and more mobile applications run on multiple mobile operating systems to attract more users of different platforms. Although versions on different platforms are implemented in different programming languages (e.g., Java and Objective-C), there must be many code snippets that implement the similar business logic on different platforms. Such code snippets are called cross-platform clones. It is challenging but essential to detect such clones for software maintenance. Due to the practice that developers usually use some common identifiers when implementing the same business logic on different platforms, in this paper, we investigate the identifier similarity of the same mobile application on different platforms and provide insights about the feasibility of cross-platform clone detection via identifier similarity. In our experiment, we have analyzed the source code of 18 open-source cross-platform applications which are implemented on Android, iOS and Windows Phone, and find that the smaller KL-Divergence the application has, the more accurate the clones detected by identifiers will be.

越来越多的移动应用程序在多个移动操作系统上运行，以吸引更多不同平台的用户。尽管不同平台上的版本是用不同的编程语言实现的(例如，Java和Objective-C)，但在不同平台上实现类似业务逻辑的代码片段肯定很多。这样的代码片段被称为跨平台克隆。为软件维护检测这样的克隆是具有挑战性的，但也是必要的。由于开发人员在不同平台上实现相同的业务逻辑时通常使用一些通用的标识符，因此本文研究了同一移动应用在不同平台上的标识符相似度，并提供了通过标识符相似度进行跨平台克隆检测的可行性。在我们的实验中，我们分析了18个在Android, iOS和Windows Phone上实现的开源跨平台应用程序的源代码，发现应用程序的KL-Divergence越小，标识符检测到的克隆越准确。

{"title":"On the feasibility of detecting cross-platform code clones via identifier similarity","authors":"Xiao Cheng, Lingxiao Jiang, Hao Zhong, Haibo Yu, Jianjun Zhao","doi":"10.1145/2975961.2975967","DOIUrl":"https://doi.org/10.1145/2975961.2975967","url":null,"abstract":"More and more mobile applications run on multiple mobile operating systems to attract more users of different platforms. Although versions on different platforms are implemented in different programming languages (e.g., Java and Objective-C), there must be many code snippets that implement the similar business logic on different platforms. Such code snippets are called cross-platform clones. It is challenging but essential to detect such clones for software maintenance. Due to the practice that developers usually use some common identifiers when implementing the same business logic on different platforms, in this paper, we investigate the identifier similarity of the same mobile application on different platforms and provide insights about the feasibility of cross-platform clone detection via identifier similarity. In our experiment, we have analyzed the source code of 18 open-source cross-platform applications which are implemented on Android, iOS and Windows Phone, and find that the smaller KL-Divergence the application has, the more accurate the clones detected by identifiers will be.","PeriodicalId":106703,"journal":{"name":"Proceedings of the 5th International Workshop on Software Mining","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Code migration with statistical machine translation 代码迁移与统计机器翻译

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2990477

T. Nguyen

In modern software development, developers often need to migrate code written for one platform in a programming language to another language for a different platform. The migration process is often performed manually or semi-automatically, in which developers are required to manually define translation rules and API mappings between languages. This talk outlines our research plan and results in investigating Statistical Machine Translation (SMT) in supporting code migration. We will explain the challenges and our solutions to address them, as well as our vision along this direction.

在现代软件开发中，开发人员经常需要将用一种编程语言为一个平台编写的代码迁移到用于不同平台的另一种语言。迁移过程通常是手动或半自动执行的，在这种情况下，开发人员需要手动定义语言之间的翻译规则和API映射。本报告概述了统计机器翻译(SMT)在支持代码迁移方面的研究计划和结果。我们将解释这些挑战和我们的解决方案，以及我们在这个方向上的愿景。

引用次数: 2

Mining timed regular expressions from system traces 从系统跟踪中挖掘定时正则表达式

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975962

Greta Cutulenco, Yogi Joshi, Apurva Narayan, S. Fischmeister

Dynamic behavior of a program can be assessed through examination of events emitted by the program during execution. Temporal properties define the order of occurrence and timing constraints on event occurrence. Such specifications are important for safety-critical real-time systems for which a delayed response to an emitted event may lead to a fault in the system. Since temporal properties are rarely specified for programs and due to the complexity of the formalisms, it is desirable to suggest properties by extracting them from traces of program execution for testing, verification, anomaly detection, and debugging purposes. We propose a framework for automatically mining properties that are in the form of timed regular expressions (TREs) from system traces. Using an abstract structure of the property, the framework constructs a finite state machine to serve as an acceptor. As part of the framework, we propose two novel algorithms optimized for mining general TREs and a fragment without negation. The framework is evaluated on industrial strength safety-critical real-time applications (a deployed autonomous hexacopter system and a commercial vehicle in operation) using traces with more than 1 Million entries. Our framework is open source and available online:https://bitbucket.org/sfischme/tre-mining

程序的动态行为可以通过检查程序在执行期间发出的事件来评估。时间属性定义事件发生的顺序和时间约束。这些规范对于安全关键型实时系统非常重要，因为对发出事件的延迟响应可能导致系统故障。由于时间属性很少为程序指定，并且由于形式化的复杂性，因此建议通过从程序执行的跟踪中提取属性来进行测试、验证、异常检测和调试。我们提出了一个框架，用于从系统跟踪中自动挖掘以定时正则表达式(TREs)形式存在的属性。使用属性的抽象结构，框架构造了一个有限状态机作为接受者。作为框架的一部分，我们提出了两种新的算法来优化挖掘一般TREs和不带否定的片段。该框架在工业强度安全关键实时应用(部署的自主六架直升机系统和运行中的商用车)中使用超过100万个条目的轨迹进行评估。我们的框架是开源的，可以在线访问:https://bitbucket.org/sfischme/tre-mining

{"title":"Mining timed regular expressions from system traces","authors":"Greta Cutulenco, Yogi Joshi, Apurva Narayan, S. Fischmeister","doi":"10.1145/2975961.2975962","DOIUrl":"https://doi.org/10.1145/2975961.2975962","url":null,"abstract":"Dynamic behavior of a program can be assessed through examination of events emitted by the program during execution. Temporal properties define the order of occurrence and timing constraints on event occurrence. Such specifications are important for safety-critical real-time systems for which a delayed response to an emitted event may lead to a fault in the system. Since temporal properties are rarely specified for programs and due to the complexity of the formalisms, it is desirable to suggest properties by extracting them from traces of program execution for testing, verification, anomaly detection, and debugging purposes. We propose a framework for automatically mining properties that are in the form of timed regular expressions (TREs) from system traces. Using an abstract structure of the property, the framework constructs a finite state machine to serve as an acceptor. As part of the framework, we propose two novel algorithms optimized for mining general TREs and a fragment without negation. The framework is evaluated on industrial strength safety-critical real-time applications (a deployed autonomous hexacopter system and a commercial vehicle in operation) using traces with more than 1 Million entries. Our framework is open source and available online:https://bitbucket.org/sfischme/tre-mining","PeriodicalId":106703,"journal":{"name":"Proceedings of the 5th International Workshop on Software Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131250246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

By the power of SMT! mining function contracts to better bounded model checking 凭借SMT的力量!挖掘函数可以更好地进行有界模型检查

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 2016-09-03 DOI: 10.1145/2975961.2975963

A. Abdullin, M. Akhin

Program analysis is rapidly changing the way we develop software; one of the more important problems is that of function contract creation, as these contracts can greatly increase the quality and performance of the analysis. However, the predominant way of creating function contracts is their manual development by the end-user. In this paper we present an approach which allows one to automatically collect function contracts for bounded model checking by software mining augmented with deep SMT solver integration. The prototype implementation in Borealis bounded model checker has been evaluated on a number of programs and proved its ability to find interesting contracts.

程序分析正在迅速改变我们开发软件的方式;其中一个更重要的问题是函数契约的创建，因为这些契约可以极大地提高分析的质量和性能。然而，创建功能契约的主要方式是由最终用户手工开发。本文提出了一种基于深度SMT求解器集成的软件挖掘自动收集有界模型检验的函数契约的方法。Borealis有限模型检查器中的原型实现已经在许多程序上进行了评估，并证明了它能够找到有趣的契约。

引用次数: 0

Proceedings of the 5th International Workshop on Software Mining 第五届软件挖掘国际研讨会论文集

Proceedings of the 5th International Workshop on Software Mining

Pub Date : 1900-01-01 DOI: 10.1145/2975961

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 5th International Workshop on Software Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀