Bianca Trinkenreich, I. Wiese, A. Sarma, M. Gerosa, Igor Steinmacher
Women are underrepresented in Open Source Software (OSS) projects, as a result of which, not only do women lose career and skill development opportunities, but the projects themselves suffer from a lack of diversity of perspectives. Practitioners and researchers need to understand more about the phenomenon; however, studies about women in open source are spread across multiple fields, including information systems, software engineering, and social science. This article systematically maps, aggregates, and synthesizes the state-of-the-art on women’s participation in OSS. It focuses on women contributors’ representation and demographics, how they contribute, their motivations and challenges, and strategies employed by communities to attract and retain women. We identified 51 articles (published between 2000 and 2021) that investigated women’s participation in OSS. We found evidence in these papers about who are the women who contribute, what motivates them to contribute, what types of contributions they make, challenges they face, and strategies proposed to support their participation. According to these studies, only about 5% of projects were reported to have women as core developers, and women authored less than 5% of pull-requests, but had similar or even higher rates of pull-request acceptances than men. Women make both code and non-code contributions, and their motivations to contribute include learning new skills, altruism, reciprocity, and kinship. Challenges that women face in OSS are mainly social, including lack of peer parity and non-inclusive communication from a toxic culture. We found 10 strategies reported in the literature, which we mapped to the reported challenges. Based on these results, we provide guidelines for future research and practice.
{"title":"Women’s Participation in Open Source Software: A Survey of the Literature","authors":"Bianca Trinkenreich, I. Wiese, A. Sarma, M. Gerosa, Igor Steinmacher","doi":"10.1145/3510460","DOIUrl":"https://doi.org/10.1145/3510460","url":null,"abstract":"Women are underrepresented in Open Source Software (OSS) projects, as a result of which, not only do women lose career and skill development opportunities, but the projects themselves suffer from a lack of diversity of perspectives. Practitioners and researchers need to understand more about the phenomenon; however, studies about women in open source are spread across multiple fields, including information systems, software engineering, and social science. This article systematically maps, aggregates, and synthesizes the state-of-the-art on women’s participation in OSS. It focuses on women contributors’ representation and demographics, how they contribute, their motivations and challenges, and strategies employed by communities to attract and retain women. We identified 51 articles (published between 2000 and 2021) that investigated women’s participation in OSS. We found evidence in these papers about who are the women who contribute, what motivates them to contribute, what types of contributions they make, challenges they face, and strategies proposed to support their participation. According to these studies, only about 5% of projects were reported to have women as core developers, and women authored less than 5% of pull-requests, but had similar or even higher rates of pull-request acceptances than men. Women make both code and non-code contributions, and their motivations to contribute include learning new skills, altruism, reciprocity, and kinship. Challenges that women face in OSS are mainly social, including lack of peer parity and non-inclusive communication from a toxic culture. We found 10 strategies reported in the literature, which we mapped to the reported challenges. Based on these results, we provide guidelines for future research and practice.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"387 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2021-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77485156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angelo Ferrando, Louise Dennis, R. C. Cardoso, Michael Fisher, D. Ancona, V. Mascardi
When applying formal verification to a system that interacts with the real world, we must use a model of the environment. This model represents an abstraction of the actual environment, so it is necessarily incomplete and hence presents an issue for system verification. If the actual environment matches the model, then the verification is correct; however, if the environment falls outside the abstraction captured by the model, then we cannot guarantee that the system is well behaved. A solution to this problem consists in exploiting the model of the environment used for statically verifying the system’s behaviour and, if the verification succeeds, using it also for validating the model against the real environment via runtime verification. The article discusses this approach and demonstrates its feasibility by presenting its implementation on top of a framework integrating the Agent Java PathFinder model checker. A high-level Domain Specific Language is used to model the environment in a user-friendly way; the latter is then compiled to trace expressions for both static formal verification and runtime verification. To evaluate our approach, we apply it to two different case studies: an autonomous cruise control system and a simulation of the Mars Curiosity rover.
{"title":"Toward a Holistic Approach to Verification and Validation of Autonomous Cognitive Systems","authors":"Angelo Ferrando, Louise Dennis, R. C. Cardoso, Michael Fisher, D. Ancona, V. Mascardi","doi":"10.1145/3447246","DOIUrl":"https://doi.org/10.1145/3447246","url":null,"abstract":"When applying formal verification to a system that interacts with the real world, we must use a model of the environment. This model represents an abstraction of the actual environment, so it is necessarily incomplete and hence presents an issue for system verification. If the actual environment matches the model, then the verification is correct; however, if the environment falls outside the abstraction captured by the model, then we cannot guarantee that the system is well behaved. A solution to this problem consists in exploiting the model of the environment used for statically verifying the system’s behaviour and, if the verification succeeds, using it also for validating the model against the real environment via runtime verification. The article discusses this approach and demonstrates its feasibility by presenting its implementation on top of a framework integrating the Agent Java PathFinder model checker. A high-level Domain Specific Language is used to model the environment in a user-friendly way; the latter is then compiled to trace expressions for both static formal verification and runtime verification. To evaluate our approach, we apply it to two different case studies: an autonomous cruise control system and a simulation of the Mars Curiosity rover.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"141 1","pages":"1 - 43"},"PeriodicalIF":0.0,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80420693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanjie Zhao, Li Li, Haoyu Wang, Haipeng Cai, Tegawendé F. Bissyandé, Jacques Klein, J. Grundy
Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.
{"title":"On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection","authors":"Yanjie Zhao, Li Li, Haoyu Wang, Haipeng Cai, Tegawendé F. Bissyandé, Jacques Klein, J. Grundy","doi":"10.1145/3446905","DOIUrl":"https://doi.org/10.1145/3446905","url":null,"abstract":"Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"43 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81399151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Silverio Mart'inez-Fern'andez, J. Bogner, Xavier Franch, M. Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, S. Wagner
AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.
{"title":"Software Engineering for AI-Based Systems: A Survey","authors":"Silverio Mart'inez-Fern'andez, J. Bogner, Xavier Franch, M. Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, S. Wagner","doi":"10.1145/3487043","DOIUrl":"https://doi.org/10.1145/3487043","url":null,"abstract":"AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"32 1","pages":"1 - 59"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85041650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simone Scalabrino, A. Mastropaolo, G. Bavota, R. Oliveto
Search-based techniques have been successfully used to automate test case generation. Such approaches allocate a fixed search budget to generate test cases aiming at maximizing code coverage. The search budget plays a crucial role; due to the hugeness of the search space, the higher the assigned budget, the higher the expected coverage. Code components have different structural properties that may affect the ability of search-based techniques to achieve a high coverage level. Thus, allocating a fixed search budget for all the components is not recommended and a component-specific search budget should be preferred. However, deciding the budget to assign to a given component is not a trivial task. In this article, we introduce Budget Optimization for Testing (BOT), an approach to adaptively allocate the search budget to the classes under test. BOT requires information about the branch coverage that will be achieved on each class with a given search budget. Therefore, we also introduce BRANCHOS, an approach that predicts coverage in a budget-aware way. The results of our experiments show that (i) BRANCHOS can approximate the branch coverage in time with a low error, and (ii) BOT can significantly increase the coverage achieved by a test generation tool and the effectiveness of generated tests.
{"title":"An Adaptive Search Budget Allocation Approach for Search-Based Test Case Generation","authors":"Simone Scalabrino, A. Mastropaolo, G. Bavota, R. Oliveto","doi":"10.1145/3446199","DOIUrl":"https://doi.org/10.1145/3446199","url":null,"abstract":"Search-based techniques have been successfully used to automate test case generation. Such approaches allocate a fixed search budget to generate test cases aiming at maximizing code coverage. The search budget plays a crucial role; due to the hugeness of the search space, the higher the assigned budget, the higher the expected coverage. Code components have different structural properties that may affect the ability of search-based techniques to achieve a high coverage level. Thus, allocating a fixed search budget for all the components is not recommended and a component-specific search budget should be preferred. However, deciding the budget to assign to a given component is not a trivial task. In this article, we introduce Budget Optimization for Testing (BOT), an approach to adaptively allocate the search budget to the classes under test. BOT requires information about the branch coverage that will be achieved on each class with a given search budget. Therefore, we also introduce BRANCHOS, an approach that predicts coverage in a budget-aware way. The results of our experiments show that (i) BRANCHOS can approximate the branch coverage in time with a low error, and (ii) BOT can significantly increase the coverage achieved by a test generation tool and the effectiveness of generated tests.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"42 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78627518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiuyuan Chen, Chunyang Chen, Safwat Hassan, Zhengchang Xing, Xin Xia, Ahmed E. Hassan
UI (User Interface) is an essential factor influencing users’ perception of an app. However, it is hard for even professional designers to determine if the UI is good or not for end-users. Users’ feedback (e.g., user reviews in the Google Play) provides a way for app owners to understand how the users perceive the UI. In this article, we conduct an in-depth empirical study to analyze the UI issues of mobile apps. In particular, we analyze more than 3M UI-related reviews from 22,199 top free-to-download apps and 9,380 top non-free apps in the Google Play Store. By comparing the rating of UI-related reviews and other reviews of an app, we observe that UI-related reviews have lower ratings than other reviews. By manually analyzing a random sample of 1,447 UI-related reviews with a 95% confidence level and a 5% interval, we identify 17 UI-related issues types that belong to four categories (i.e., “Appearance,” “Interaction,” “Experience,” and “Others”). In these issue types, we find “Generic Review” is the most occurring one. “Comparative Review” and “Advertisement” are the most negative two UI issue types. Faced with these UI issues, we explore the patterns of interaction between app owners and users. We identify eight patterns of how app owners dialogue with users about UI issues by the review-response mechanism. We find “Apology or Appreciation” and “Information Request” are the most two frequent patterns. We find updating UI timely according to feedback is essential to satisfy users. Besides, app owners could also fix UI issues without updating UI, especially for issue types belonging to “Interaction” category. Our findings show that there exists a positive impact if app owners could actively interact with users to improve UI quality and boost users’ satisfactoriness about the UIs.
UI(用户界面)是影响用户对应用感知的重要因素。然而,即使是专业的设计师也很难确定UI对最终用户来说是好是坏。用户反馈(如Google Play中的用户评论)为应用所有者提供了一种了解用户如何看待UI的方法。在这篇文章中,我们进行了深入的实证研究来分析移动应用的UI问题。我们特别分析了Google Play Store中22199款热门免费应用和9380款热门非免费应用的超过3M条ui相关评论。通过比较ui相关评论和其他评论的评分,我们发现ui相关评论的评分低于其他评论。通过手动分析1447个ui相关评论的随机样本,置信水平为95%,间隔为5%,我们确定了17个ui相关问题类型,属于四个类别(即,“外观”,“交互”,“体验”和“其他”)。在这些问题类型中,我们发现“Generic Review”是最常见的一种。“比较评论”和“广告”是最负面的两种UI问题类型。面对这些UI问题,我们探索了应用程序所有者和用户之间的交互模式。我们确定了应用程序所有者如何通过评论-响应机制与用户就UI问题进行对话的八种模式。我们发现“道歉或感谢”和“信息请求”是最常见的两种模式。我们发现根据用户反馈及时更新UI是满足用户需求的关键。此外,应用程序所有者也可以在不更新UI的情况下修复UI问题,特别是属于“交互”类别的问题类型。我们的研究结果表明,如果应用程序所有者能够积极地与用户互动,以改善UI质量并提高用户对UI的满意度,那么就会产生积极的影响。
{"title":"How Should I Improve the UI of My App?","authors":"Qiuyuan Chen, Chunyang Chen, Safwat Hassan, Zhengchang Xing, Xin Xia, Ahmed E. Hassan","doi":"10.1145/3447808","DOIUrl":"https://doi.org/10.1145/3447808","url":null,"abstract":"UI (User Interface) is an essential factor influencing users’ perception of an app. However, it is hard for even professional designers to determine if the UI is good or not for end-users. Users’ feedback (e.g., user reviews in the Google Play) provides a way for app owners to understand how the users perceive the UI. In this article, we conduct an in-depth empirical study to analyze the UI issues of mobile apps. In particular, we analyze more than 3M UI-related reviews from 22,199 top free-to-download apps and 9,380 top non-free apps in the Google Play Store. By comparing the rating of UI-related reviews and other reviews of an app, we observe that UI-related reviews have lower ratings than other reviews. By manually analyzing a random sample of 1,447 UI-related reviews with a 95% confidence level and a 5% interval, we identify 17 UI-related issues types that belong to four categories (i.e., “Appearance,” “Interaction,” “Experience,” and “Others”). In these issue types, we find “Generic Review” is the most occurring one. “Comparative Review” and “Advertisement” are the most negative two UI issue types. Faced with these UI issues, we explore the patterns of interaction between app owners and users. We identify eight patterns of how app owners dialogue with users about UI issues by the review-response mechanism. We find “Apology or Appreciation” and “Information Request” are the most two frequent patterns. We find updating UI timely according to feedback is essential to satisfy users. Besides, app owners could also fix UI issues without updating UI, especially for issue types belonging to “Interaction” category. Our findings show that there exists a positive impact if app owners could actively interact with users to improve UI quality and boost users’ satisfactoriness about the UIs.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"61 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83795846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blockchain offers a distributed ledger to record data collected from Internet of Thing (IoT) devices as immutable and tamper-proof transactions and securely shared among authorized participants in a Peer-to-Peer (P2P) network. Despite the growing interest in using blockchain for securing IoT systems, there is a general lack of systematic research and comprehensive review of the design issues on the integration of blockchain and IoT from the software architecture perspective. This article presents a catalog of architectural tactics for the design of IoT systems supported by blockchain as a result of a Systematic Literature Review (SLR) on IoT and blockchain to extract the commonly reported quality attributes, design decisions, and relevant architectural tactics for the architectural design of this category of systems. Our findings are threefold: (i) identification of security, scalability, performance, and interoperability as the commonly reported quality attributes; (ii) a catalog of twelve architectural tactics for the design of IoT systems supported by blockchain; and (iii) gaps in research that include tradeoffs among quality attributes and identified tactics. These tactics might provide architects and designers with different options when searching for an optimal architectural design that meets the quality attributes of interest and constraints of a system.
{"title":"Architecting Internet of Things Systems with Blockchain","authors":"Wendy Yánez, R. Bahsoon, Yuqun Zhang, R. Kazman","doi":"10.1145/3442412","DOIUrl":"https://doi.org/10.1145/3442412","url":null,"abstract":"Blockchain offers a distributed ledger to record data collected from Internet of Thing (IoT) devices as immutable and tamper-proof transactions and securely shared among authorized participants in a Peer-to-Peer (P2P) network. Despite the growing interest in using blockchain for securing IoT systems, there is a general lack of systematic research and comprehensive review of the design issues on the integration of blockchain and IoT from the software architecture perspective. This article presents a catalog of architectural tactics for the design of IoT systems supported by blockchain as a result of a Systematic Literature Review (SLR) on IoT and blockchain to extract the commonly reported quality attributes, design decisions, and relevant architectural tactics for the architectural design of this category of systems. Our findings are threefold: (i) identification of security, scalability, performance, and interoperability as the commonly reported quality attributes; (ii) a catalog of twelve architectural tactics for the design of IoT systems supported by blockchain; and (iii) gaps in research that include tradeoffs among quality attributes and identified tactics. These tactics might provide architects and designers with different options when searching for an optimal architectural design that meets the quality attributes of interest and constraints of a system.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"7 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75233032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thiago M. Castro, Leopoldo Teixeira, Vander Alves, S. Apel, Maxime Cordy, Rohit Gheyi
A number of product-line analysis approaches lift analyses such as type checking, model checking, and theorem proving from the level of single programs to the level of product lines. These approaches share concepts and mechanisms that suggest an unexplored potential for reuse of key analysis steps and properties, implementation, and verification efforts. Despite the availability of taxonomies synthesizing such approaches, there still remains the underlying problem of not being able to describe product-line analyses and their properties precisely and uniformly. We propose a formal framework that models product-line analyses in a compositional manner, providing an overall understanding of the space of family-based, feature-based, and product-based analysis strategies. It defines precisely how the different types of product-line analyses compose and inter-relate. To ensure soundness, we formalize the framework, providing mechanized specification and proofs of key concepts and properties of the individual analyses. The formalization provides unambiguous definitions of domain terminology and assumptions as well as solid evidence of key properties based on rigorous formal proofs. To qualitatively assess the generality of the framework, we discuss to what extent it describes five representative product-line analyses targeting the following properties: safety, performance, dataflow facts, security, and functional program properties.
{"title":"A Formal Framework of Software Product Line Analyses","authors":"Thiago M. Castro, Leopoldo Teixeira, Vander Alves, S. Apel, Maxime Cordy, Rohit Gheyi","doi":"10.1145/3442389","DOIUrl":"https://doi.org/10.1145/3442389","url":null,"abstract":"A number of product-line analysis approaches lift analyses such as type checking, model checking, and theorem proving from the level of single programs to the level of product lines. These approaches share concepts and mechanisms that suggest an unexplored potential for reuse of key analysis steps and properties, implementation, and verification efforts. Despite the availability of taxonomies synthesizing such approaches, there still remains the underlying problem of not being able to describe product-line analyses and their properties precisely and uniformly. We propose a formal framework that models product-line analyses in a compositional manner, providing an overall understanding of the space of family-based, feature-based, and product-based analysis strategies. It defines precisely how the different types of product-line analyses compose and inter-relate. To ensure soundness, we formalize the framework, providing mechanized specification and proofs of key concepts and properties of the individual analyses. The formalization provides unambiguous definitions of domain terminology and assumptions as well as solid evidence of key properties based on rigorous formal proofs. To qualitatively assess the generality of the framework, we discuss to what extent it describes five representative product-line analyses targeting the following properties: safety, performance, dataflow facts, security, and functional program properties.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"100 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, Yulei Sui
Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control- and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges.
{"title":"DeepWukong","authors":"Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, Yulei Sui","doi":"10.1145/3436877","DOIUrl":"https://doi.org/10.1145/3436877","url":null,"abstract":"Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control- and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"45 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74578619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Sun, Li Li, Tegawendé F. Bissyandé, Jacques Klein, Damien Octeau, John C. Grundy
Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static analysis tools for Android are challenged by the presence of reflective calls, which they usually ignore. Thus, the results of their security analysis, e.g., for private data leaks, are incomplete, given the measures taken by malware writers to elude static detection. We propose a new instrumentation-based approach to address this issue in a non-invasive way. Specifically, we introduce to the community a prototype tool called DroidRA, which reduces the resolution of reflective calls to a composite constant propagation problem and then leverages the COAL solver to infer the values of reflection targets. After that, it automatically instruments the app to replace reflective calls with their corresponding Java calls in a traditional paradigm. Our approach augments an app so that it can be more effectively statically analyzable, including by such static analyzers that are not reflection-aware. We evaluate DroidRA on benchmark apps as well as on real-world apps, and we demonstrate that it can indeed infer the target values of reflective calls and subsequently allow state-of-the-art tools to provide more sound and complete analysis results.
{"title":"Taming Reflection","authors":"Xiaoyu Sun, Li Li, Tegawendé F. Bissyandé, Jacques Klein, Damien Octeau, John C. Grundy","doi":"10.1145/3440033","DOIUrl":"https://doi.org/10.1145/3440033","url":null,"abstract":"Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static analysis tools for Android are challenged by the presence of reflective calls, which they usually ignore. Thus, the results of their security analysis, e.g., for private data leaks, are incomplete, given the measures taken by malware writers to elude static detection. We propose a new instrumentation-based approach to address this issue in a non-invasive way. Specifically, we introduce to the community a prototype tool called DroidRA, which reduces the resolution of reflective calls to a composite constant propagation problem and then leverages the COAL solver to infer the values of reflection targets. After that, it automatically instruments the app to replace reflective calls with their corresponding Java calls in a traditional paradigm. Our approach augments an app so that it can be more effectively statically analyzable, including by such static analyzers that are not reflection-aware. We evaluate DroidRA on benchmark apps as well as on real-world apps, and we demonstrate that it can indeed infer the target values of reflective calls and subsequently allow state-of-the-art tools to provide more sound and complete analysis results.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"12 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87912859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}