2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)最新文献_第4页

The Emerging Role of Data Scientists on Software Development Teams 数据科学家在软件开发团队中的新角色

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884783

Miryung Kim, Thomas Zimmermann, R. Deline, Andrew Begel

Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.

创建和运行软件会产生大量关于开发过程和客户使用情况的原始数据，这些数据可以在熟练的数据科学家的帮助下转化为可操作的见解。不幸的是，拥有分析和软件工程技能来分析这些大型数据集的数据科学家很难找到;直到最近，软件公司才开始开发面向软件的数据分析能力。为了了解这一新兴角色，我们采访了微软几个产品团队的数据科学家。在本文中，我们描述了他们的教育和培训背景，他们在软件工程环境中的任务，以及他们所处理的问题类型。我们确定了数据科学家的五种不同的工作风格:(1)洞察提供者，他们与工程师一起收集所需的数据，为管理者做出决策提供信息;(2)建模专家，利用他们的机器学习专业知识建立预测模型;(3)平台构建者，创建数据平台，平衡工程和数据分析问题;(4)通才，所有数据科学活动都由自己完成;(5)团队领导者，负责管理数据科学家团队并传播最佳实践。我们进一步描述了一套他们用来增加其工作的影响力和可操作性的策略。

{"title":"The Emerging Role of Data Scientists on Software Development Teams","authors":"Miryung Kim, Thomas Zimmermann, R. Deline, Andrew Begel","doi":"10.1145/2884781.2884783","DOIUrl":"https://doi.org/10.1145/2884781.2884783","url":null,"abstract":"Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"36 1","pages":"96-107"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79446691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 191

An Empirical Study on the Impact of C++ Lambdas and Programmer Experience c++ Lambdas对程序员经验影响的实证研究

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884849

P. M. Uesbeck, A. Stefik, Stefan Hanenberg, J. Pedersen, P. Daleiden

Lambdas have seen increasing use in mainstream programming languages, notably in Java 8 and C++ 11. While the technical aspects of lambdas are known, we conducted the first randomized controlled trial on the human factors impact of C++ 11 lambdas compared to iterators. Because there has been recent debate on having students or professionals in experiments, we recruited undergraduates across the academic pipeline and professional programmers to evaluate these findings in a broader context. Results afford some doubt that lambdas benefit developers and show evidence that students are negatively impacted in regard to how quickly they can write correct programs to a test specification and whether they can complete a task. Analysis from log data shows that participants spent more time with compiler errors, and have more errors, when using lambdas as compared to iterators, suggesting difficulty with the syntax chosen for C++. Finally, experienced users were more likely to complete tasks, with or without lambdas, and could do so more quickly, with experience as a factor explaining 45.7% of the variance in our sample in regard to completion time.

lambda在主流编程语言中的使用越来越多，特别是在Java 8和c++ 11中。虽然lambda的技术方面是已知的，但我们进行了第一个随机对照试验，比较了c++ 11 lambda与迭代器的人为因素影响。由于最近有关于让学生或专业人员参与实验的争论，我们在学术管道和专业程序员中招募了本科生，以在更广泛的背景下评估这些发现。结果对lambdas是否有利于开发人员提出了一些质疑，并显示出证据表明，学生在编写符合测试规范的正确程序的速度以及是否能够完成任务方面受到了负面影响。对日志数据的分析表明，与使用迭代器相比，参与者在使用lambdas时花在编译器错误上的时间更多，错误也更多，这表明为c++选择语法有困难。最后，有经验的用户更有可能完成任务，无论是否使用lambda，并且可以更快地完成任务，经验作为一个因素解释了我们样本中关于完成时间的45.7%的差异。

{"title":"An Empirical Study on the Impact of C++ Lambdas and Programmer Experience","authors":"P. M. Uesbeck, A. Stefik, Stefan Hanenberg, J. Pedersen, P. Daleiden","doi":"10.1145/2884781.2884849","DOIUrl":"https://doi.org/10.1145/2884781.2884849","url":null,"abstract":"Lambdas have seen increasing use in mainstream programming languages, notably in Java 8 and C++ 11. While the technical aspects of lambdas are known, we conducted the first randomized controlled trial on the human factors impact of C++ 11 lambdas compared to iterators. Because there has been recent debate on having students or professionals in experiments, we recruited undergraduates across the academic pipeline and professional programmers to evaluate these findings in a broader context. Results afford some doubt that lambdas benefit developers and show evidence that students are negatively impacted in regard to how quickly they can write correct programs to a test specification and whether they can complete a task. Analysis from log data shows that participants spent more time with compiler errors, and have more errors, when using lambdas as compared to iterators, suggesting difficulty with the syntax chosen for C++. Finally, experienced users were more likely to complete tasks, with or without lambdas, and could do so more quickly, with experience as a factor explaining 45.7% of the variance in our sample in regard to completion time.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"760-771"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79646070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Fixing Deadlocks via Lock Pre-Acquisitions 通过锁预获取修复死锁

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884819

Yan Cai, Lingwei Cao

Manual deadlock fixing is error-prone and time-consuming. Exist-ing generic approach (GA) simply inserts gate locks to fix dead-locks by serializing executions, which could introduce various new deadlocks and incur high runtime overhead. We propose a novel approach DFixer to fix deadlocks without introducing any new deadlocks by design. DFixer only selects one thread of a deadlock to pre-acquire a lock w together with another lock h, where before fixing, the deadlock occurs when the thread holds lock h and waits for lock w. As such, DFixer eliminates a hold-and-wait necessary condition, preventing the deadlock from occurring. The thread per-forming pre-acquisition is carefully selected such that no other syn-chronization exists in between the two original acquisitions. Other-wise, DFixer further introduces a context-aware conditional protect-ed by above lock w to guarantee the correctness of DFixer. The evaluation is on 20 deadlocks, including 17 from widely-used real-world C/C++ programs. It shows that DFixer successfully fixed all deadlocks. Whereas GA introduced 9 new deadlocks; a latest work Grail failed to fix 8 deadlocks and introduced 3 new deadlocks on others. On average, DFixer incurred only 2.1% overhead, where GA and Grail incurred 15.8% and 11.5% overhead, respectively.

手动修复死锁容易出错且耗时。现有的通用方法(GA)只是通过序列化执行来插入门锁来修复死锁，这可能会引入各种新的死锁，并导致高运行时开销。我们提出了一种新颖的方法DFixer来修复死锁，而不会在设计上引入任何新的死锁。DFixer只选择死锁的一个线程来预先获取锁w和另一个锁h，在固定之前，当线程持有锁h并等待锁w时发生死锁。因此，DFixer消除了持有和等待的必要条件，防止死锁发生。线程执行的预获取是经过仔细选择的，这样在两个原始获取之间就不会存在其他同步。否则，DFixer进一步引入了一个上下文感知的条件，该条件由上述锁w保护，以保证DFixer的正确性。对20个死锁进行了评估，其中17个来自现实世界中广泛使用的C/ c++程序。它显示DFixer成功地修复了所有死锁。而GA引入了9个新的死锁;最近的一个工作Grail未能修复8个死锁，并在其他的死锁上引入了3个新的死锁。DFixer平均只产生2.1%的开销，而GA和Grail分别产生15.8%和11.5%的开销。

{"title":"Fixing Deadlocks via Lock Pre-Acquisitions","authors":"Yan Cai, Lingwei Cao","doi":"10.1145/2884781.2884819","DOIUrl":"https://doi.org/10.1145/2884781.2884819","url":null,"abstract":"Manual deadlock fixing is error-prone and time-consuming. Exist-ing generic approach (GA) simply inserts gate locks to fix dead-locks by serializing executions, which could introduce various new deadlocks and incur high runtime overhead. We propose a novel approach DFixer to fix deadlocks without introducing any new deadlocks by design. DFixer only selects one thread of a deadlock to pre-acquire a lock w together with another lock h, where before fixing, the deadlock occurs when the thread holds lock h and waits for lock w. As such, DFixer eliminates a hold-and-wait necessary condition, preventing the deadlock from occurring. The thread per-forming pre-acquisition is carefully selected such that no other syn-chronization exists in between the two original acquisitions. Other-wise, DFixer further introduces a context-aware conditional protect-ed by above lock w to guarantee the correctness of DFixer. The evaluation is on 20 deadlocks, including 17 from widely-used real-world C/C++ programs. It shows that DFixer successfully fixed all deadlocks. Whereas GA introduced 9 new deadlocks; a latest work Grail failed to fix 8 deadlocks and introduced 3 new deadlocks on others. On average, DFixer incurred only 2.1% overhead, where GA and Grail incurred 15.8% and 11.5% overhead, respectively.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1109-1120"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81493292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Coverage-Driven Test Code Generation for Concurrent Classes 并发类的覆盖驱动测试代码生成

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884876

Valerio Terragni, S. Cheung

Previous techniques on concurrency testing have mainly focused on exploring the interleaving space of manually written test code to expose faulty interleavings of shared memory accesses. These techniques assume the availability of failure-inducing tests. In this paper, we present AutoConTest, a coverage-driven approach to generate effective concurrent test code that achieve high interleaving coverage. AutoConTest consists of three components. First, it computes the coverage requirements dynamically and iteratively during sequential test code generation, using a coverage metric that captures the execution context of shared memory accesses. Second, it smartly selects these sequential codes based on the computed result and assembles them for concurrent tests, achieving increased context-sensitive interleaving coverage. Third, it explores the newly covered interleavings. We have implemented AutoConTest as an automated tool and evaluated it using 6 real-world concurrent Java subjects. The results show that AutoConTest is able to generate effective concurrent tests that achieve high interleaving coverage and expose concurrency faults quickly. AutoConTest took less than 65 seconds (including program analysis, test generation and execution) to expose the faults in the program subjects.

以前的并发测试技术主要集中在探索手工编写的测试代码的交错空间，以暴露共享内存访问的错误交错。这些技术假定有故障诱导测试的可用性。在本文中，我们提出了AutoConTest，这是一种覆盖驱动的方法，可以生成有效的并发测试代码，从而实现高交叉覆盖。AutoConTest由三个部分组成。首先，它在连续的测试代码生成过程中动态地、迭代地计算覆盖率需求，使用捕获共享内存访问的执行上下文的覆盖率度量。其次，它根据计算的结果巧妙地选择这些顺序代码，并将它们组装起来用于并发测试，从而提高了上下文敏感的交错覆盖率。第三，探索新覆盖的交错。我们已经将AutoConTest实现为一个自动化工具，并使用6个真实世界的并发Java主题对其进行了评估。结果表明，AutoConTest能够生成有效的并发测试，实现高交错覆盖率和快速暴露并发错误。AutoConTest在不到65秒的时间内(包括程序分析、测试生成和执行)暴露了程序主体中的错误。

{"title":"Coverage-Driven Test Code Generation for Concurrent Classes","authors":"Valerio Terragni, S. Cheung","doi":"10.1145/2884781.2884876","DOIUrl":"https://doi.org/10.1145/2884781.2884876","url":null,"abstract":"Previous techniques on concurrency testing have mainly focused on exploring the interleaving space of manually written test code to expose faulty interleavings of shared memory accesses. These techniques assume the availability of failure-inducing tests. In this paper, we present AutoConTest, a coverage-driven approach to generate effective concurrent test code that achieve high interleaving coverage. AutoConTest consists of three components. First, it computes the coverage requirements dynamically and iteratively during sequential test code generation, using a coverage metric that captures the execution context of shared memory accesses. Second, it smartly selects these sequential codes based on the computed result and assembles them for concurrent tests, achieving increased context-sensitive interleaving coverage. Third, it explores the newly covered interleavings. We have implemented AutoConTest as an automated tool and evaluated it using 6 real-world concurrent Java subjects. The results show that AutoConTest is able to generate effective concurrent tests that achieve high interleaving coverage and expose concurrency faults quickly. AutoConTest took less than 65 seconds (including program analysis, test generation and execution) to expose the faults in the program subjects.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"14 1","pages":"1121-1132"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74908206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Feedback-Directed Instrumentation for Deployed JavaScript Applications 针对已部署JavaScript应用的反馈导向检测

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884846

Magnus Madsen, F. Tip, Esben Andreasen, Koushik Sen, Anders Møller

Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For this type of error, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation.We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to reduce it for users who do not encounter the crash. We implemented our technique in a tool, Crowdie, and evaluated it on 10 real-world issues for which error messages and stack traces are insufficient to isolate the problem. Our results show that feedback-directed instrumentation requires 5% to 25% of the program to be instrumented, that the same crash must be observed 3 to 10 times to discover the crash path, and that feedback-directed instrumentation typically slows down execution by a factor 2x–9x compared to 8x–90x for an approach where applications are fully instrumented.

JavaScript应用程序中的许多错误表现为，当发生故障时，对象的属性值不正确。对于这种类型的错误，堆栈跟踪和日志文件通常不足以诊断问题。在这种情况下，了解从对象创建到崩溃语句的控制流路径对开发人员很有帮助。这样的崩溃路径对于理解对象的起源以及对象的任何属性自创建以来是否被破坏非常有用。我们提出了一种用于计算崩溃路径的反馈导向检测技术，该技术允许将检测开销分配给一群用户，并减少没有遇到崩溃的用户的开销。我们在一个工具Crowdie中实现了我们的技术，并在10个实际问题上对其进行了评估，这些问题的错误消息和堆栈跟踪不足以隔离问题。我们的结果表明，反馈导向的检测需要对5%到25%的程序进行检测，同一崩溃必须观察3到10次才能发现崩溃路径，对于应用程序完全检测的方法，反馈导向的检测通常会将执行速度降低2 - 9倍，而不是8 - 90倍。

{"title":"Feedback-Directed Instrumentation for Deployed JavaScript Applications","authors":"Magnus Madsen, F. Tip, Esben Andreasen, Koushik Sen, Anders Møller","doi":"10.1145/2884781.2884846","DOIUrl":"https://doi.org/10.1145/2884781.2884846","url":null,"abstract":"Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For this type of error, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation.We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to reduce it for users who do not encounter the crash. We implemented our technique in a tool, Crowdie, and evaluated it on 10 real-world issues for which error messages and stack traces are insufficient to isolate the problem. Our results show that feedback-directed instrumentation requires 5% to 25% of the program to be instrumented, that the same crash must be observed 3 to 10 times to discover the crash path, and that feedback-directed instrumentation typically slows down execution by a factor 2x–9x compared to 8x–90x for an approach where applications are fully instrumented.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"338 1","pages":"899-910"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75357329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Grounded Theory in Software Engineering Research: A Critical Review and Guidelines 软件工程研究中的扎根理论:批判性回顾与指导

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884833

Klaas-Jan Stol, P. Ralph, Brian Fitzgerald

Grounded Theory (GT) has proved an extremely useful research approach in several fields including medical sociology, nursing, education and management theory. However, GT is a complex method based on an inductive paradigm that is fundamentally different from the traditional hypothetico-deductive research model. As there are at least three variants of GT, some ostensibly GT research suffers from method slurring, where researchers adopt an arbitrary subset of GT practices that are not recognizable as GT. In this paper, we describe the variants of GT and identify the core set of GT practices. We then analyze the use of grounded theory in software engineering. We carefully and systematically selected 98 articles that mention GT, of which 52 explicitly claim to use GT, with the other 46 using GT techniques only. Only 16 articles provide detailed accounts of their research procedures. We offer guidelines to improve the quality of both conducting and reporting GT studies. The latter is an important extension since current GT guidelines in software engineering do not cover the reporting process, despite good reporting being necessary for evaluating a study and informing subsequent research.

扎根理论在医学社会学、护理学、教育和管理理论等多个领域已被证明是一种非常有用的研究方法。然而，GT是一种基于归纳范式的复杂方法，与传统的假设-演绎研究模式有着本质的区别。由于GT至少有三种变体，一些表面上的GT研究存在方法模糊的问题，研究人员采用了任意子集的GT实践，这些实践不能被识别为GT。在本文中，我们描述了GT的变体，并识别了GT实践的核心集。然后我们分析了扎根理论在软件工程中的应用。我们仔细而系统地选择了98篇提到GT的文章，其中52篇明确声称使用了GT，其他46篇仅使用了GT技术。只有16篇文章详细描述了他们的研究过程。我们提供指导方针，以提高进行和报告GT研究的质量。后者是一个重要的扩展，因为当前软件工程中的GT指南没有涵盖报告过程，尽管良好的报告对于评估研究和通知后续研究是必要的。

{"title":"Grounded Theory in Software Engineering Research: A Critical Review and Guidelines","authors":"Klaas-Jan Stol, P. Ralph, Brian Fitzgerald","doi":"10.1145/2884781.2884833","DOIUrl":"https://doi.org/10.1145/2884781.2884833","url":null,"abstract":"Grounded Theory (GT) has proved an extremely useful research approach in several fields including medical sociology, nursing, education and management theory. However, GT is a complex method based on an inductive paradigm that is fundamentally different from the traditional hypothetico-deductive research model. As there are at least three variants of GT, some ostensibly GT research suffers from method slurring, where researchers adopt an arbitrary subset of GT practices that are not recognizable as GT. In this paper, we describe the variants of GT and identify the core set of GT practices. We then analyze the use of grounded theory in software engineering. We carefully and systematically selected 98 articles that mention GT, of which 52 explicitly claim to use GT, with the other 46 using GT techniques only. Only 16 articles provide detailed accounts of their research procedures. We offer guidelines to improve the quality of both conducting and reporting GT studies. The latter is an important extension since current GT guidelines in software engineering do not cover the reporting process, despite good reporting being necessary for evaluating a study and informing subsequent research.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"120-131"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78085125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 401

An Empirical Study of Practitioners' Perspectives on Green Software Engineering 绿色软件工程实践者视角的实证研究

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884810

Irene Manotas, C. Bird, Rui Zhang, D. Shepherd, Ciera Jaspan, Caitlin Sadowski, L. Pollock, J. Clause

The energy consumption of software is an increasing concern as the use of mobile applications, embedded systems, and data center-based services expands. While research in green software engineering is correspondingly increasing, little is known about the current practices and perspectives of software engineers in the field. This paper describes the first empirical study of how practitioners think about energy when they write requirements, design, construct, test, and maintain their software. We report findings from a quantitative,targeted survey of 464 practitioners from ABB, Google, IBM, and Microsoft, which was motivated by and supported with qualitative data from 18 in-depth interviews with Microsoft employees. The major findings and implications from the collected data contextualize existing green software engineering research and suggest directions for researchers aiming to develop strategies and tools to help practitioners improve the energy usage of their applications.

随着移动应用程序、嵌入式系统和基于数据中心的服务的扩展，软件的能源消耗日益受到关注。虽然对绿色软件工程的研究正在相应地增加，但对该领域软件工程师的当前实践和观点知之甚少。本文描述了实践者在编写需求、设计、构建、测试和维护他们的软件时如何思考能量的第一个实证研究。我们报告了对来自ABB、b谷歌、IBM和微软的464名从业人员进行的定量、有针对性的调查的结果，该调查的动机和支持来自18次对微软员工的深度访谈的定性数据。收集到的数据的主要发现和意义为现有的绿色软件工程研究提供了背景，并为旨在开发策略和工具以帮助从业者改善其应用程序的能源使用的研究人员提供了方向。

引用次数: 132

Augmenting API Documentation with Insights from Stack Overflow 用堆栈溢出的见解来增强API文档

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884800

Christoph Treude, M. Robillard

Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.

软件开发人员需要访问不同类型的信息，这些信息通常分散在不同的文档源中，例如API文档或Stack Overflow。我们提出了一种方法，用Stack Overflow中的“洞察力句子”自动增加API文档，这些句子与特定API类型相关，并提供该类型API文档中未包含的洞察力。基于1,574个句子的开发集，我们比较了两种最先进的摘要技术以及基于模式的洞察句子提取方法的性能。然后，我们提出了SISE，这是一种基于机器学习的新方法，它使用句子本身、格式、问题、答案、作者以及词性标签和句子与相应API文档的相似性作为特征。使用SISE，我们能够在开发集上实现0.64的精度和0.7的覆盖率。在与8位软件开发人员的比较研究中，我们发现SISE产生了最多的句子，这些句子被认为添加了API文档中没有的有用信息。这些结果表明，考虑Stack Overflow上可用的元数据以及词性标签可以显着改善应用于Stack Overflow数据的无监督提取方法。

{"title":"Augmenting API Documentation with Insights from Stack Overflow","authors":"Christoph Treude, M. Robillard","doi":"10.1145/2884781.2884800","DOIUrl":"https://doi.org/10.1145/2884781.2884800","url":null,"abstract":"Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with \"insight sentences\" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"392-403"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 235

Automatic Model Generation from Documentation for Java API Functions 从Java API函数文档中自动生成模型

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884881

Juan Zhai, Jianjun Huang, Shiqing Ma, X. Zhang, Lin Tan, Jianhua Zhao, Feng Qin

Modern software systems are becoming increasingly complex, relying on a lot of third-party library support. Library behaviors are hence an integral part of software behaviors. Analyzing them is as important as analyzing the software itself. However, analyzing libraries is highly challenging due to the lack of source code, implementation in different languages, and complex optimizations. We observe that many Java library functions provide excellent documentation, which concisely describes the functionalities of the functions. We develop a novel technique that can construct models for Java API functions by analyzing the documentation. These models are simpler implementations in Java compared to the original ones and hence easier to analyze. More importantly, they provide the same functionalities as the original functions. Our technique successfully models 326 functions from 14 widely used Java classes. We also use these models in static taint analysis on Android apps and dynamic slicing for Java programs, demonstrating the effectiveness and efficiency of our models.

现代软件系统正变得越来越复杂，依赖于大量第三方库的支持。因此，库行为是软件行为的一个组成部分。分析它们和分析软件本身一样重要。然而，由于缺乏源代码、不同语言的实现以及复杂的优化，分析库非常具有挑战性。我们注意到许多Java库函数提供了优秀的文档，这些文档简明地描述了函数的功能。我们开发了一种新的技术，可以通过分析文档来构建Java API函数的模型。与原始模型相比，这些模型在Java中的实现更简单，因此更容易分析。更重要的是，它们提供了与原始函数相同的功能。我们的技术成功地对14个广泛使用的Java类中的326个函数进行了建模。我们还将这些模型用于Android应用程序的静态污点分析和Java程序的动态切片，证明了我们模型的有效性和效率。

引用次数: 46

Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation 基于低秩恢复和半监督回归的缺失数据估算方法

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

Pub Date : 2016-05-14 DOI: 10.1145/2884781.2884827

Xiaoyuan Jing, Fumin Qi, Fei Wu, Baowen Xu

Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.

软件工作量估算(SEE)是软件开发中的一个关键步骤。实际数据收集过程中经常会出现工作量数据丢失的情况。针对缺失数据问题，现有的SEE方法采用删除、忽略或插补策略来解决缺失数据问题，其中插补策略更有助于提高估计性能。目前SEE的数据归算方法采用经典的缺失数据归算方法，但这些归算方法都有各自的缺点，可能不适合努力数据的归算。在本文中，我们的目的是提供一个有效的解决方案，以努力的数据缺失问题。不完全性包括驱动因素缺失情况和努力标签缺失情况。针对驱动因子缺失的情况，介绍了低秩恢复技术。在努力标签缺失的情况下，我们采用半监督回归技术进行归算。在此基础上，我们提出了一种新的努力数据归算方法，即低秩恢复和半监督回归归算(LRSRI)。在7个应用广泛的软件工作量数据集上进行的实验表明:(1)与其他方法相比，该方法可以获得更好的工作量数据输入效果;(2)该方法能很好地应用于多个估计量。

{"title":"Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation","authors":"Xiaoyuan Jing, Fumin Qi, Fei Wu, Baowen Xu","doi":"10.1145/2884781.2884827","DOIUrl":"https://doi.org/10.1145/2884781.2884827","url":null,"abstract":"Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"62 1","pages":"607-618"},"PeriodicalIF":0.0,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79645392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25