Yujie Yuan, Lihua Xu, Xusheng Xiao, Andy Podgurski, Huibiao Zhu
Fault localization is a well-received technique for helping developers to identify faulty statements of a program. Research has shown that the coverages of faulty statements and its predecessors in program dependence graph are important for effective fault localization. However, app executions in Android split into segments in different components, i.e., methods, threads, and processes, posing challenges for traditional program dependence computation, and in turn rendering fault localization less effective. We present RunDroid, a tool for recovering the dynamic call graphs of app executions in Android, assisting existing tools for more precise program dependence computation. For each exectuion, RunDroid captures and recovers method calls from not only the application layer, but also between applications and the Android framework. Moreover, to deal with the widely adopted multi-threaded communications in Android applications, RunDroid also captures methods calls that are split among threads. Demo : https://github.com/MiJack/RunDroid Video : https://youtu.be/EM7TJbE-Oaw
{"title":"RunDroid: recovering execution call graphs for Android applications","authors":"Yujie Yuan, Lihua Xu, Xusheng Xiao, Andy Podgurski, Huibiao Zhu","doi":"10.1145/3106237.3122821","DOIUrl":"https://doi.org/10.1145/3106237.3122821","url":null,"abstract":"Fault localization is a well-received technique for helping developers to identify faulty statements of a program. Research has shown that the coverages of faulty statements and its predecessors in program dependence graph are important for effective fault localization. However, app executions in Android split into segments in different components, i.e., methods, threads, and processes, posing challenges for traditional program dependence computation, and in turn rendering fault localization less effective. We present RunDroid, a tool for recovering the dynamic call graphs of app executions in Android, assisting existing tools for more precise program dependence computation. For each exectuion, RunDroid captures and recovers method calls from not only the application layer, but also between applications and the Android framework. Moreover, to deal with the widely adopted multi-threaded communications in Android applications, RunDroid also captures methods calls that are split among threads. Demo : https://github.com/MiJack/RunDroid Video : https://youtu.be/EM7TJbE-Oaw","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129045058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Germán Regis, Renzo Degiovanni, Nicolás D'Ippolito, Nazareno Aguirre
In this paper we present CLTSA (Counting Fluents Labelled Transition System Analyser), an extension of LTSA (Labelled Transition System Analyser) that incorporates counting fluents, a useful mechanism to capture properties related to counting events. Counting fluent temporal logic is a formalism for specifying properties of event-based systems, which complements the notion of fluent by the related concept of counting fluent. While fluents allow us to capture boolean properties of the behaviour of a reactive system, counting fluents are numerical values, that enumerate event occurrences. The tool supports a superset of FSP (Finite State Processes), that allows one to define LTL properties involving counting fluents, which can be model checked on FSP processes. Detailed information can be found at http://dc.exa.unrc.edu.ar/tools/cltsa.
{"title":"CLTSA: labelled transition system analyser with counting fluent support","authors":"Germán Regis, Renzo Degiovanni, Nicolás D'Ippolito, Nazareno Aguirre","doi":"10.1145/3106237.3122828","DOIUrl":"https://doi.org/10.1145/3106237.3122828","url":null,"abstract":"In this paper we present CLTSA (Counting Fluents Labelled Transition System Analyser), an extension of LTSA (Labelled Transition System Analyser) that incorporates counting fluents, a useful mechanism to capture properties related to counting events. Counting fluent temporal logic is a formalism for specifying properties of event-based systems, which complements the notion of fluent by the related concept of counting fluent. While fluents allow us to capture boolean properties of the behaviour of a reactive system, counting fluents are numerical values, that enumerate event occurrences. The tool supports a superset of FSP (Finite State Processes), that allows one to define LTL properties involving counting fluents, which can be model checked on FSP processes. Detailed information can be found at http://dc.exa.unrc.edu.ar/tools/cltsa.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122457556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, Emad Shihab
Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use.
{"title":"Why do developers use trivial packages? an empirical case study on npm","authors":"Rabe Abdalkareem, Olivier Nourry, Sultan Wehaibi, Suhaib Mujahid, Emad Shihab","doi":"10.1145/3106237.3106267","DOIUrl":"https://doi.org/10.1145/3106237.3106267","url":null,"abstract":"Code reuse is traditionally seen as good practice. Recent trends have pushed the concept of code reuse to an extreme, by using packages that implement simple and trivial tasks, which we call `trivial packages'. A recent incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix made it imperative to question the growing use of trivial packages. Therefore, in this paper, we mine more than 230,000 npm packages and 38,000 JavaScript applications in order to study the prevalence of trivial packages. We found that trivial packages are common and are increasing in popularity, making up 16.8% of the studied npm packages. We performed a survey with 88 Node.js developers who use trivial packages to understand the reasons and drawbacks of their use. Our survey revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we empirically validate the most cited reason and drawback and find that, contrary to developers' beliefs, only 45.2% of trivial packages even have tests. However, trivial packages appear to be `deployment tested' and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 11.5% of the studied trivial packages have more than 20 dependencies. Hence, developers should be careful about which trivial packages they decide to use.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"974 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, Lin Tan
Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).
{"title":"Better test cases for better automated program repair","authors":"Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, Lin Tan","doi":"10.1145/3106237.3106274","DOIUrl":"https://doi.org/10.1145/3106237.3106274","url":null,"abstract":"Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128305432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The runtime performance of a software system often depends on a large number of static parameters, which usually interact in complex ways to carry out system functionality and influence system performance. It's hard to understand such configuration spaces and find good combinations of parameter values to gain available levels of performance. Engineers in practice often just accept the default settings, leading such systems to significantly underperform relative to their potential. This problem, in turn, has impacts on cost, revenue, customer satisfaction, business reputation, and mission effectiveness. To improve the overall performance of the end-to-end systems, we propose to systematically explore (i) how to design new systems towards good performance through design space synthesis and evaluation, and (ii) how to auto-configure an existing system to obtain better performance through heuristic configuration space search. In addition, this research further studies execution traces of a system to predict runtime performance under new configurations.
{"title":"System performance optimization via design and configuration space exploration","authors":"Chong Tang","doi":"10.1145/3106237.3119880","DOIUrl":"https://doi.org/10.1145/3106237.3119880","url":null,"abstract":"The runtime performance of a software system often depends on a large number of static parameters, which usually interact in complex ways to carry out system functionality and influence system performance. It's hard to understand such configuration spaces and find good combinations of parameter values to gain available levels of performance. Engineers in practice often just accept the default settings, leading such systems to significantly underperform relative to their potential. This problem, in turn, has impacts on cost, revenue, customer satisfaction, business reputation, and mission effectiveness. To improve the overall performance of the end-to-end systems, we propose to systematically explore (i) how to design new systems towards good performance through design space synthesis and evaluation, and (ii) how to auto-configure an existing system to obtain better performance through heuristic configuration space search. In addition, this research further studies execution traces of a system to predict runtime performance under new configurations.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114991667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roel van Dijk, Christophe Creeten, J. V. D. Ham, J. V. D. Bos
Network traffic data contains a wealth of information for use in security analysis and application development. Unfortunately, it also usually contains confidential or otherwise sensitive information, prohibiting sharing and analysis. Existing automated anonymization solutions are hard to maintain and tend to be outdated. We present Privacy-Enhanced Filtering (PEF), a model-driven prototype framework that relies on declarative descriptions of protocols and a set of filter rules, which are used to automatically transform network traffic data to remove sensitive information. This paper discusses the design, implementation and application of PEF, which is available as open-source software and configured for use in a typical malware detection scenario.
{"title":"Model-driven software engineering in practice: privacy-enhanced filtering of network traffic","authors":"Roel van Dijk, Christophe Creeten, J. V. D. Ham, J. V. D. Bos","doi":"10.1145/3106237.3117777","DOIUrl":"https://doi.org/10.1145/3106237.3117777","url":null,"abstract":"Network traffic data contains a wealth of information for use in security analysis and application development. Unfortunately, it also usually contains confidential or otherwise sensitive information, prohibiting sharing and analysis. Existing automated anonymization solutions are hard to maintain and tend to be outdated. We present Privacy-Enhanced Filtering (PEF), a model-driven prototype framework that relies on declarative descriptions of protocols and a set of filter rules, which are used to automatically transform network traffic data to remove sensitive information. This paper discusses the design, implementation and application of PEF, which is available as open-source software and configured for use in a typical malware detection scenario.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129587280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenqi Bu, Minhui Xue, Lihua Xu, Yajin Zhou, Zhushou Tang, Tao Xie
Despite recent progress in program analysis techniques to identify vulnerabilities in Android apps, significant challenges still remain for applying these techniques to large-scale industrial environments. Modern software-security providers, such as Qihoo 360 and Pwnzen (two leading companies in China), are often required to process more than 10 million mobile apps at each run. In this work, we focus on effectively and efficiently identifying vulnerable usage of Internet sockets in an industrial setting. To achieve this goal, we propose a practical hybrid approach that enables lightweight yet precise detection in the industrial setting. In particular, we integrate the process of categorizing potential vulnerable apps with analysis techniques, to reduce the inevitable human inspection effort. We categorize potential vulnerable apps based on characteristics of vulnerability signatures, to reduce the burden on static analysis. We flexibly integrate static and dynamic analyses for apps in each identified family, to refine the family signatures and hence target on precise detection. We implement our approach in a practical system and deploy the system on the Pwnzen platform. By using the system, we identify and report potential vulnerabilities of 24 vulnerable apps (falling into 3 vulnerability families) to their developers, and some of these reported vulnerabilities are previously unknown. The apps of each vulnerability family in total have over 50 million downloads. We also propose countermeasures and highlight promising directions for technology transfer.
{"title":"When program analysis meets mobile security: an industrial study of misusing Android internet sockets","authors":"Wenqi Bu, Minhui Xue, Lihua Xu, Yajin Zhou, Zhushou Tang, Tao Xie","doi":"10.1145/3106237.3117764","DOIUrl":"https://doi.org/10.1145/3106237.3117764","url":null,"abstract":"Despite recent progress in program analysis techniques to identify vulnerabilities in Android apps, significant challenges still remain for applying these techniques to large-scale industrial environments. Modern software-security providers, such as Qihoo 360 and Pwnzen (two leading companies in China), are often required to process more than 10 million mobile apps at each run. In this work, we focus on effectively and efficiently identifying vulnerable usage of Internet sockets in an industrial setting. To achieve this goal, we propose a practical hybrid approach that enables lightweight yet precise detection in the industrial setting. In particular, we integrate the process of categorizing potential vulnerable apps with analysis techniques, to reduce the inevitable human inspection effort. We categorize potential vulnerable apps based on characteristics of vulnerability signatures, to reduce the burden on static analysis. We flexibly integrate static and dynamic analyses for apps in each identified family, to refine the family signatures and hence target on precise detection. We implement our approach in a practical system and deploy the system on the Pwnzen platform. By using the system, we identify and report potential vulnerabilities of 24 vulnerable apps (falling into 3 vulnerability families) to their developers, and some of these reported vulnerabilities are previously unknown. The apps of each vulnerability family in total have over 50 million downloads. We also propose countermeasures and highlight promising directions for technology transfer.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Galster, S. Angelov, Silverio Martínez-Fernández, Daniel Tofan
Software reference architectures provide templates and guidelines for designing systems in a particular domain. Companies use them to achieve interoperability of (parts of) their software, standardization, and faster development. In contrast to system-specific software architectures that "emerge" during development, reference architectures dictate significant parts of the software design early on. Agile software development frameworks (such as Scrum) acknowledge changing software requirements and the need to adapt the software design accordingly. In this paper, we present lessons learned about how reference architectures interact with Scrum (the most frequently used agile process framework). These lessons are based on observing software development projects in five companies. We found that reference architectures can support good practice in Scrum: They provide enough design upfront without too much effort, reduce documentation activities, facilitate knowledge sharing, and contribute to "architectural thinking" of developers. However, reference architectures can impose risks or even threats to the success of Scrum (e.g., to self-organizing and motivated teams).
{"title":"Reference architectures and Scrum: friends or foes?","authors":"M. Galster, S. Angelov, Silverio Martínez-Fernández, Daniel Tofan","doi":"10.1145/3106237.3117773","DOIUrl":"https://doi.org/10.1145/3106237.3117773","url":null,"abstract":"Software reference architectures provide templates and guidelines for designing systems in a particular domain. Companies use them to achieve interoperability of (parts of) their software, standardization, and faster development. In contrast to system-specific software architectures that \"emerge\" during development, reference architectures dictate significant parts of the software design early on. Agile software development frameworks (such as Scrum) acknowledge changing software requirements and the need to adapt the software design accordingly. In this paper, we present lessons learned about how reference architectures interact with Scrum (the most frequently used agile process framework). These lessons are based on observing software development projects in five companies. We found that reference architectures can support good practice in Scrum: They provide enough design upfront without too much effort, reduce documentation activities, facilitate knowledge sharing, and contribute to \"architectural thinking\" of developers. However, reference architectures can impose risks or even threats to the success of Scrum (e.g., to self-organizing and motivated teams).","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130639310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sun-Ro Lee, Min-Jae Heo, Chan-Gun Lee, Milhan Kim, Gaeul Jeong
Finding the appropriate developer for a bug report, so called `Bug Triage', is one of the bottlenecks in the bug resolution process. To address this problem, many approaches have proposed various automatic bug triage techniques in recent studies. We argue that most previous studies focused on open source projects only and did not consider deep learning techniques. In this paper, we propose to use Convolutional Neural Network and word embedding to build an automatic bug triager. The results of the experiments applied to both industrial and open source projects reveal benefits of the automatic approach and suggest co-operation of human and automatic triagers. Our experience in integrating and operating the proposed system in an industrial development environment is also reported.
{"title":"Applying deep learning based automatic bug triager to industrial projects","authors":"Sun-Ro Lee, Min-Jae Heo, Chan-Gun Lee, Milhan Kim, Gaeul Jeong","doi":"10.1145/3106237.3117776","DOIUrl":"https://doi.org/10.1145/3106237.3117776","url":null,"abstract":"Finding the appropriate developer for a bug report, so called `Bug Triage', is one of the bottlenecks in the bug resolution process. To address this problem, many approaches have proposed various automatic bug triage techniques in recent studies. We argue that most previous studies focused on open source projects only and did not consider deep learning techniques. In this paper, we propose to use Convolutional Neural Network and word embedding to build an automatic bug triager. The results of the experiments applied to both industrial and open source projects reveal benefits of the automatic approach and suggest co-operation of human and automatic triagers. Our experience in integrating and operating the proposed system in an industrial development environment is also reported.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129804101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minghui Zhou, Qingying Chen, A. Mockus, Fengguang Wu
Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers' work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable.
{"title":"On the scalability of Linux kernel maintainers' work","authors":"Minghui Zhou, Qingying Chen, A. Mockus, Fengguang Wu","doi":"10.1145/3106237.3106287","DOIUrl":"https://doi.org/10.1145/3106237.3106287","url":null,"abstract":"Open source software ecosystems evolve ways to balance the workload among groups of participants ranging from core groups to peripheral groups. As ecosystems grow, it is not clear whether the mechanisms that previously made them work will continue to be relevant or whether new mechanisms will need to evolve. The impact of failure for critical ecosystems such as Linux is enormous, yet the understanding of why they function and are effective is limited. We, therefore, aim to understand how the Linux kernel sustains its growth, how to characterize the workload of maintainers, and whether or not the existing mechanisms are scalable. We quantify maintainers' work through the files that are maintained, and the change activity and the numbers of contributors in those files. We find systematic differences among modules; these differences are stable over time, which suggests that certain architectural features, commercial interests, or module-specific practices lead to distinct sustainable equilibria. We find that most of the modules have not grown appreciably over the last decade; most growth has been absorbed by a few modules. We also find that the effort per maintainer does not increase, even though the community has hypothesized that required effort might increase. However, the distribution of work among maintainers is highly unbalanced, suggesting that a few maintainers may experience increasing workload. We find that the practice of assigning multiple maintainers to a file yields only a power of 1/2 increase in productivity. We expect that our proposed framework to quantify maintainer practices will help clarify the factors that allow rapidly growing ecosystems to be sustainable.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129842562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}