2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)最新文献

英文中文

MSR '20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020 MSR '20:第17届挖掘软件存储库国际会议，2020年6月29-30日，韩国首尔

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2020-06-29 DOI: 10.1145/3379597

引用次数: 4

Who you gonna call?: analyzing web requests in Android applications 你要打给谁?:分析Android应用程序中的web请求

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2017-05-20 DOI: 10.1109/MSR.2017.11

M. Rapoport, Philippe Suter, Erik Wittern, O. Lhoták, Julian T Dolby

Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data.

依靠无处不在的互联网连接，移动设备上的应用程序在执行过程中经常执行web请求。它们获取数据供用户交互、调用远程功能或发送用户生成的内容或元数据。这些请求共同揭示了移动应用开发的常见做法，如使用什么外部服务以及如何使用，它们指出了可能的负面影响，如安全和隐私侵犯，或对电池寿命的影响。在本文中，我们评估了分析Android应用程序发出的web请求的不同方法。我们首先展示从运行20个随机选择的Android应用程序并观察其网络活动中收集的动态数据。接下来，我们介绍了一个静态分析工具，Stringoid，它分析Android应用程序中的字符串连接，以估计构造的URL字符串。使用Stringoid，我们从30,000个Android应用程序中提取url，并将其性能与更简单的常量提取分析进行比较。最后，我们比较了Stringoid从相同的20个应用程序中提取的数据和动态收集的数据，讨论了动态和静态分析在提取url时的优点和局限性。

{"title":"Who you gonna call?: analyzing web requests in Android applications","authors":"M. Rapoport, Philippe Suter, Erik Wittern, O. Lhoták, Julian T Dolby","doi":"10.1109/MSR.2017.11","DOIUrl":"https://doi.org/10.1109/MSR.2017.11","url":null,"abstract":"Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"60 1","pages":"80-90"},"PeriodicalIF":0.0,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84554111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Cena słońca w projektowaniu architektonicznym

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-09-01 DOI: 10.21858/MSR.19.13

W. Werner

引用次数: 2

Software Ingredients: Detection of Third-Party Component Reuse in Java Software Release 软件成分:Java软件发布中第三方组件复用的检测

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901773

T. Ishio, R. Kula, Tetsuya Kanda, D. Germán, Katsuro Inoue

A software product is often dependent on a large number of third-party components.To assess potential risks, such as security vulnerabilities and license violations, a list of components and their versions in a product is important for release engineers and security analysts.Since such a list is not always available, a code comparison technique named Software Bertillonage has been proposed to test whether a product likely includes a copy of a particular component or not.Although the technique can extract candidates of reused components, a user still has to manually identify the original components among the candidates.In this paper, we propose a method to automatically select the most likely origin of components reused in a product, based on an assumption that a product tends to include an entire copy of a component rather than a partial copy.More concretely, given a Java product and a repository of jar files of existing components, our method selects jar files that can provide Java classes to the product in a greedy manner.To compare the method with the existing technique, we have conducted an evaluation using randomly created jar files including up to 1,000 components.The Software Bertillonage technique reports many candidates; the precision and recall are 0.357 and 0.993, respectively.Our method reports a list of original components whose precision and recall are 0.998 and 0.997.

软件产品通常依赖于大量的第三方组件。为了评估潜在的风险，例如安全漏洞和许可证违规，产品中的组件及其版本列表对于发布工程师和安全分析师来说非常重要。由于这样的列表并不总是可用的，因此已经提出了一种名为Software Bertillonage的代码比较技术来测试产品是否可能包含特定组件的副本。尽管该技术可以提取重用组件的候选项，但用户仍然需要在候选项中手动识别原始组件。在本文中，我们提出了一种方法来自动选择产品中重用的最有可能的组件来源，该方法基于一个假设，即产品倾向于包含组件的整个副本而不是部分副本。更具体地说，给定一个Java产品和现有组件的jar文件存储库，我们的方法选择能够以贪婪的方式向产品提供Java类的jar文件。为了将该方法与现有技术进行比较，我们使用随机创建的jar文件(包括多达1,000个组件)进行了评估。软件Bertillonage技术报告了许多候选;精密度和召回率分别为0.357和0.993。我们的方法报告了原厂成分列表，其精密度和召回率分别为0.998和0.997。

{"title":"Software Ingredients: Detection of Third-Party Component Reuse in Java Software Release","authors":"T. Ishio, R. Kula, Tetsuya Kanda, D. Germán, Katsuro Inoue","doi":"10.1145/2901739.2901773","DOIUrl":"https://doi.org/10.1145/2901739.2901773","url":null,"abstract":"A software product is often dependent on a large number of third-party components.To assess potential risks, such as security vulnerabilities and license violations, a list of components and their versions in a product is important for release engineers and security analysts.Since such a list is not always available, a code comparison technique named Software Bertillonage has been proposed to test whether a product likely includes a copy of a particular component or not.Although the technique can extract candidates of reused components, a user still has to manually identify the original components among the candidates.In this paper, we propose a method to automatically select the most likely origin of components reused in a product, based on an assumption that a product tends to include an entire copy of a component rather than a partial copy.More concretely, given a Java product and a repository of jar files of existing components, our method selects jar files that can provide Java classes to the product in a greedy manner.To compare the method with the existing technique, we have conducted an evaluation using randomly created jar files including up to 1,000 components.The Software Bertillonage technique reports many candidates; the precision and recall are 0.357 and 0.993, respectively.Our method reports a list of original components whose precision and recall are 0.998 and 0.997.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"373 1","pages":"339-350"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74871173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Mining Performance Regression Inducing Code Changes in Evolving Software 演化软件中诱导代码变更的性能回归挖掘

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901765

Qi Luo, D. Poshyvanyk, M. Grechanik

During software evolution, the source code of a system frequently changes due to bug fixes or new feature requests. Some of these changes may accidentally degrade performance of a newly released software version. A notable problem of regression testing is how to find problematic changes (out of a large number of committed changes) that may be responsible for performance regressions under certain test inputs.We propose a novel recommendation system, coined as PefImpact, for automatically identifying code changes that may potentially be responsible for performance regressions using a combination of search-based input profiling and change impact analysis techniques. PefImpact independently sends the same input values to two releases of the application under test, and uses a genetic algorithm to mine execution traces and explore a large space of input value combinations to find specific inputs that take longer time to execute in a new release. Since these input values are likely to expose performance regressions, PefImpact automatically mines the corresponding execution traces to evaluate the impact of each code change on the performance and ranks the changes based on their estimated contribution to performance regressions. We implemented PefImpact and evaluated it on different releases of two open-source web applications. The results demonstrate that PefImpact effectively detects input value combinations to expose performance regressions and mines the code changes are likely to be responsible for these performance regressions.

在软件发展过程中，由于错误修复或新特性请求，系统的源代码经常发生变化。其中一些更改可能会意外地降低新发布的软件版本的性能。回归测试的一个值得注意的问题是如何发现有问题的更改(从大量提交的更改中)，这些更改可能导致某些测试输入下的性能回归。我们提出了一种新的推荐系统，称为PefImpact，用于使用基于搜索的输入分析和更改影响分析技术的组合来自动识别可能导致性能下降的代码更改。PefImpact独立地将相同的输入值发送到被测应用程序的两个版本，并使用遗传算法挖掘执行轨迹，并探索大量的输入值组合空间，以找到在新版本中需要较长时间执行的特定输入。由于这些输入值可能会暴露性能退化，因此PefImpact会自动挖掘相应的执行跟踪，以评估每个代码更改对性能的影响，并根据对性能退化的估计贡献对更改进行排序。我们实现了PefImpact，并在两个开源web应用程序的不同版本上对其进行了评估。结果表明，PefImpact有效地检测输入值组合以暴露性能退化，并挖掘可能导致这些性能退化的代码更改。

{"title":"Mining Performance Regression Inducing Code Changes in Evolving Software","authors":"Qi Luo, D. Poshyvanyk, M. Grechanik","doi":"10.1145/2901739.2901765","DOIUrl":"https://doi.org/10.1145/2901739.2901765","url":null,"abstract":"During software evolution, the source code of a system frequently changes due to bug fixes or new feature requests. Some of these changes may accidentally degrade performance of a newly released software version. A notable problem of regression testing is how to find problematic changes (out of a large number of committed changes) that may be responsible for performance regressions under certain test inputs.We propose a novel recommendation system, coined as PefImpact, for automatically identifying code changes that may potentially be responsible for performance regressions using a combination of search-based input profiling and change impact analysis techniques. PefImpact independently sends the same input values to two releases of the application under test, and uses a genetic algorithm to mine execution traces and explore a large space of input value combinations to find specific inputs that take longer time to execute in a new release. Since these input values are likely to expose performance regressions, PefImpact automatically mines the corresponding execution traces to evaluate the impact of each code change on the performance and ranks the changes based on their estimated contribution to performance regressions. We implemented PefImpact and evaluated it on different releases of two open-source web applications. The results demonstrate that PefImpact effectively detects input value combinations to expose performance regressions and mines the code changes are likely to be responsible for these performance regressions.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"20 1","pages":"25-36"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85055323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

From Query to Usable Code: An Analysis of Stack Overflow Code Snippets 从查询到可用代码:堆栈溢出代码片段的分析

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901767

Di Yang, Aftab Hussain, C. Lopes

Enriched by natural language texts, Stack Overflow code snippets arean invaluable code-centric knowledge base of small units ofsource code. Besides being useful for software developers, theseannotated snippets can potentially serve as the basis for automatedtools that provide working code solutions to specific natural languagequeries. With the goal of developing automated tools with the Stack Overflowsnippets and surrounding text, this paper investigates the followingquestions: (1) How usable are the Stack Overflow code snippets? and(2) When using text search engines for matching on the naturallanguage questions and answers around the snippets, what percentage ofthe top results contain usable code snippets?A total of 3M code snippets are analyzed across four languages: C#,Java, JavaScript, and Python. Python and JavaScript proved to be thelanguages for which the most code snippets are usable. Conversely,Java and C# proved to be the languages with the lowest usabilityrate. Further qualitative analysis on usable Python snippets showsthe characteristics of the answers that solve the original question. Finally,we use Google search to investigate the alignment ofusability and the natural language annotations around code snippets, andexplore how to make snippets in Stack Overflow anadequate base for future automatic program generation.

通过自然语言文本的丰富，堆栈溢出代码片段提供了以代码为中心的小单元源代码的宝贵知识库。除了对软件开发人员有用之外，这些带注释的代码片段可以潜在地作为自动化工具的基础，为特定的自然语言查询提供工作代码解决方案。为了开发带有堆栈溢出代码片段和周围文本的自动化工具，本文研究了以下问题:(1)堆栈溢出代码片段的可用性如何?(2)当使用文本搜索引擎对自然语言的问题和答案进行匹配时，顶部结果中包含可用代码片段的百分比是多少?总共有3M个代码片段在四种语言中进行分析:c#、Java、JavaScript和Python。Python和JavaScript被证明是可用代码段最多的语言。相反，Java和c#被证明是可用性最低的语言。对可用Python片段的进一步定性分析显示了解决原始问题的答案的特征。最后，我们使用Google搜索来研究可用性和代码片段周围的自然语言注释的一致性，并探索如何使Stack Overflow中的代码片段成为未来自动程序生成的充分基础。

{"title":"From Query to Usable Code: An Analysis of Stack Overflow Code Snippets","authors":"Di Yang, Aftab Hussain, C. Lopes","doi":"10.1145/2901739.2901767","DOIUrl":"https://doi.org/10.1145/2901739.2901767","url":null,"abstract":"Enriched by natural language texts, Stack Overflow code snippets arean invaluable code-centric knowledge base of small units ofsource code. Besides being useful for software developers, theseannotated snippets can potentially serve as the basis for automatedtools that provide working code solutions to specific natural languagequeries. With the goal of developing automated tools with the Stack Overflowsnippets and surrounding text, this paper investigates the followingquestions: (1) How usable are the Stack Overflow code snippets? and(2) When using text search engines for matching on the naturallanguage questions and answers around the snippets, what percentage ofthe top results contain usable code snippets?A total of 3M code snippets are analyzed across four languages: C#,Java, JavaScript, and Python. Python and JavaScript proved to be thelanguages for which the most code snippets are usable. Conversely,Java and C# proved to be the languages with the lowest usabilityrate. Further qualitative analysis on usable Python snippets showsthe characteristics of the answers that solve the original question. Finally,we use Google search to investigate the alignment ofusability and the natural language annotations around code snippets, andexplore how to make snippets in Stack Overflow anadequate base for future automatic program generation.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"14 1","pages":"391-401"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90206597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 96

The Emotional Side of Software Developers in JIRA JIRA中软件开发人员的情感面

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2903505

Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, R. Tonelli, M. Marchesi, Bram Adams

ABSTRACTIssue tracking systems store valuable data for testing hy-potheses concerning maintenance, building statistical pre-diction models and (recently) investigating developer affec-tiveness. For the latter, issue tracking systems can be minedto explore developers emotions, sentiments and politeness, affects for short. However, research on affect detection insoftware artefacts is still in its early stage due to the lack ofmanually validated data and tools.In this paper, we contribute to the research of affectson software artefacts by providing a labeling of emotionspresent on issue comments.We manually labeled 2,000 issue comments and 4,000 sen-tences written by developers with emotions such as love,joy, surprise, anger, sadness and fear. Labeled commentsand sentences are linked to software artefacts reported inour previously published dataset (containing more than 1Kprojects, more than 700K issue reports and more than 2million issue comments). The enriched dataset presented inthis paper allows the investigation of the role of affects insoftware development.

问题跟踪系统存储有价值的数据，用于测试有关维护的假设，建立统计预测模型和(最近)调查开发人员的影响。对于后者，可以挖掘问题跟踪系统来探索开发人员的情绪、情绪和礼貌，简称影响。然而，由于缺乏人工验证的数据和工具，软件工件的影响检测研究仍处于早期阶段。在本文中，我们通过提供问题评论中存在的情感标签，为影响软件工件的研究做出了贡献。我们手动标记了2000条问题评论和4000个由开发者写的带有爱、喜悦、惊讶、愤怒、悲伤和恐惧等情绪的句子。标记的评论和句子链接到我们之前发布的数据集中报告的软件工件(包含超过1k个项目，超过700K个问题报告和超过200万个问题评论)。本文提供的丰富数据集允许调查影响在软件开发中的作用。

{"title":"The Emotional Side of Software Developers in JIRA","authors":"Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, R. Tonelli, M. Marchesi, Bram Adams","doi":"10.1145/2901739.2903505","DOIUrl":"https://doi.org/10.1145/2901739.2903505","url":null,"abstract":"ABSTRACTIssue tracking systems store valuable data for testing hy-potheses concerning maintenance, building statistical pre-diction models and (recently) investigating developer affec-tiveness. For the latter, issue tracking systems can be minedto explore developers emotions, sentiments and politeness, affects for short. However, research on affect detection insoftware artefacts is still in its early stage due to the lack ofmanually validated data and tools.In this paper, we contribute to the research of affectson software artefacts by providing a labeling of emotionspresent on issue comments.We manually labeled 2,000 issue comments and 4,000 sen-tences written by developers with emotions such as love,joy, surprise, anger, sadness and fear. Labeled commentsand sentences are linked to software artefacts reported inour previously published dataset (containing more than 1Kprojects, more than 700K issue reports and more than 2million issue comments). The enriched dataset presented inthis paper allows the investigation of the role of affects insoftware development.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"2017 1","pages":"480-483"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79690806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

How Software Developers Use Work Breakdown Relationships in Issue Repositories 软件开发人员如何在问题存储库中使用工作分解关系

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901779

C. A. Thompson, G. Murphy, Marc Palyart, Marko Gasparic

Software developers use issues as a means to describe a range of activities to be undertaken on a software system, including features to be added and defects that require fixing. When creating issues, software developers expend manual effort to specify relationships between issues, such as one issue blocking another or one issue being a sub-task of another. In particular, developers use a variety of relationships to express how work is to be broken down on a project. To better understand how software developers use work breakdown relationships between issues, we manually coded a sample of work breakdown relationships from three open source systems. We report on our findings and describe how the recognition of work breakdown relationships opens up new ways to improve software development techniques.

软件开发人员使用问题作为一种方法来描述要在软件系统上进行的一系列活动，包括要添加的特性和需要修复的缺陷。当创建问题时，软件开发人员花费人工来指定问题之间的关系，例如一个问题阻碍了另一个问题，或者一个问题是另一个问题的子任务。特别是，开发人员使用各种关系来表达如何在项目中分解工作。为了更好地理解软件开发人员如何使用问题之间的工作分解关系，我们从三个开放源码系统中手动编写了一个工作分解关系的示例。我们报告了我们的发现，并描述了对工作分解关系的认识如何为改进软件开发技术开辟了新的途径。

引用次数: 12

Interactive Exploration of Developer Interaction Traces using a Hidden Markov Model 使用隐马尔可夫模型的开发人员交互痕迹的交互式探索

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2901741

Kostadin Damevski, Hui Chen, D. Shepherd, L. Pollock

Using IDE usage data to analyze the behavior of software developers in the field, during the course of their daily work, can lend support to (or dispute) laboratory studies of devel- opers. This paper describes a technique that leverages Hidden Markov Models (HMMs) as a means of mining high-level developer behavior from low-level IDE interaction traces of many developers in the field. HMMs use dual stochastic processes to model higher-level hidden behavior using observable input sequences of events. We propose an interactive approach of mining interpretable HMMs, based on guiding a human expert in building a high quality HMM in an iterative, one state at a time, manner. The final result is a model that is both representative of the field data and captures the field phenomena of interest. We apply our HMM construction approach to study debugging behavior, using a large IDE interaction dataset collected from nearly 200 developers at ABB, Inc. Our results highlight the different modes and constituent actions in debugging, exhibited by the developers in our dataset.

使用IDE使用数据来分析软件开发人员在现场的行为，在他们的日常工作过程中，可以为开发人员的实验室研究提供支持(或争议)。本文描述了一种利用隐马尔可夫模型(hmm)从该领域许多开发人员的低级IDE交互痕迹中挖掘高级开发人员行为的技术。hmm使用双随机过程利用可观察的事件输入序列来模拟更高级别的隐藏行为。我们提出了一种挖掘可解释HMM的交互式方法，该方法基于指导人类专家以迭代的、每次一种状态的方式构建高质量HMM。最终的结果是一个既能代表字段数据又能捕获感兴趣的字段现象的模型。我们使用从ABB公司近200名开发人员收集的大型IDE交互数据集，应用HMM构建方法来研究调试行为。我们的结果突出了开发人员在我们的数据集中展示的调试中的不同模式和组成动作。

引用次数: 16

The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub GitHub上Java项目中提交消息细节与缺陷倾向的关系

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2016-05-14 DOI: 10.1145/2901739.2903496

Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, Shane McIntosh

Just-In-Time (JIT) defect prediction models aim to predict the commits that will introduce defects in the future. Traditionally, JIT defect prediction models are trained using metrics that are primarily derived from aspects of the code change itself (e.g., the size of the change, the author’s prior experience). In addition to the code that is submitted during a commit, authors write commit messages, which describe the commit for archival purposes. It is our position that the level of detail in these commit messages can provide additional explanatory power to JIT defect prediction models. Hence, in this paper, we analyze the relationship between the defect proneness of commits and commit message volume (i.e., the length of the commit message) and commit message content (approximated using spam filtering technology). Through analysis of JIT models that were trained using 342 GitHub repositories, we find that our JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.63-0.96 and 0.01-0.21, respectively. Furthermore, our metrics that are derived from commit message detail provide a statistically significant boost to the explanatory power to the JIT models in 43%-80% of the studied systems, accounting for up to 72% of the explanatory power. Future JIT studies should consider adding commit message detail metrics.

即时(JIT)缺陷预测模型旨在预测将来会引入缺陷的提交。传统上，JIT缺陷预测模型是使用主要来源于代码变更本身方面的度量来训练的(例如，变更的大小，作者先前的经验)。除了在提交过程中提交的代码之外，作者还编写提交消息，这些消息描述了用于存档的提交。我们的立场是，这些提交消息中的细节级别可以为JIT缺陷预测模型提供额外的解释力。因此，在本文中，我们分析了提交的缺陷倾向与提交消息数量(即提交消息的长度)和提交消息内容(使用垃圾邮件过滤技术近似)之间的关系。通过对使用342个GitHub存储库训练的JIT模型的分析，我们发现我们的JIT模型优于随机猜测模型，AUC和Brier得分分别在0.63-0.96和0.01-0.21之间。此外，我们从提交消息细节中导出的指标在统计上显著地提高了43%-80%所研究系统中JIT模型的解释能力，占解释能力的72%。未来的JIT研究应该考虑添加提交消息细节指标。

{"title":"The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub","authors":"Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, Shane McIntosh","doi":"10.1145/2901739.2903496","DOIUrl":"https://doi.org/10.1145/2901739.2903496","url":null,"abstract":"Just-In-Time (JIT) defect prediction models aim to predict the commits that will introduce defects in the future. Traditionally, JIT defect prediction models are trained using metrics that are primarily derived from aspects of the code change itself (e.g., the size of the change, the author’s prior experience). In addition to the code that is submitted during a commit, authors write commit messages, which describe the commit for archival purposes. It is our position that the level of detail in these commit messages can provide additional explanatory power to JIT defect prediction models. Hence, in this paper, we analyze the relationship between the defect proneness of commits and commit message volume (i.e., the length of the commit message) and commit message content (approximated using spam filtering technology). Through analysis of JIT models that were trained using 342 GitHub repositories, we find that our JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.63-0.96 and 0.01-0.21, respectively. Furthermore, our metrics that are derived from commit message detail provide a statistically significant boost to the explanatory power to the JIT models in 43%-80% of the studied systems, accounting for up to 72% of the explanatory power. Future JIT studies should consider adding commit message detail metrics.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"120 1","pages":"496-499"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84064789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀