Cryptographic algorithms are widely used to protect data privacy in many aspects of daily lives from smart card to cyber-physical systems. Unfortunately, programs implementing cryptographic algorithms may be vulnerable to practical power side-channel attacks, which may infer private data via statistical analysis of the correlation between power consumptions of an electronic device and private data. To thwart these attacks, several masking schemes have been proposed. However, programs that rely on secure masking schemes are not secure a priori. Although some techniques have been proposed for formally verifying masking countermeasures and for quantifying masking strength, they are currently limited to Boolean programs and suffer from low accuracy. In this work, we propose an approach for formally verifying masking countermeasures of arithmetic programs. Our approach is more accurate for arithmetic programs and more scalable for Boolean programs comparing to the existing approaches. We have implemented our methods in a verification tool QMVERIF which has been extensively evaluated on cryptographic benchmarks including full AES, DES and MAC-Keccak. The experimental results demonstrate the effectiveness and efficiency of our approach, especially for compositional reasoning.
{"title":"Formal Verification of Masking Countermeasures for Arithmetic Programs*","authors":"Pengfei Gao","doi":"10.1145/3324884.3418920","DOIUrl":"https://doi.org/10.1145/3324884.3418920","url":null,"abstract":"Cryptographic algorithms are widely used to protect data privacy in many aspects of daily lives from smart card to cyber-physical systems. Unfortunately, programs implementing cryptographic algorithms may be vulnerable to practical power side-channel attacks, which may infer private data via statistical analysis of the correlation between power consumptions of an electronic device and private data. To thwart these attacks, several masking schemes have been proposed. However, programs that rely on secure masking schemes are not secure a priori. Although some techniques have been proposed for formally verifying masking countermeasures and for quantifying masking strength, they are currently limited to Boolean programs and suffer from low accuracy. In this work, we propose an approach for formally verifying masking countermeasures of arithmetic programs. Our approach is more accurate for arithmetic programs and more scalable for Boolean programs comparing to the existing approaches. We have implemented our methods in a verification tool QMVERIF which has been extensively evaluated on cryptographic benchmarks including full AES, DES and MAC-Keccak. The experimental results demonstrate the effectiveness and efficiency of our approach, especially for compositional reasoning.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116999575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.
随着深度学习模型在许多安全关键场景中的应用越来越多,迫切需要有效的深度学习测试技术来提高深度学习模型的质量。其中一个主要的挑战是用于构建模型的训练数据和用于评估模型的测试数据之间的数据差距。为了弥合差距,测试人员的目标是从测试环境中收集输入的有效子集,使用有限的标记工作,用于重新训练DL模型。为了辅助子集选择,我们提出了多边界聚类和优先排序(multiple - boundary Clustering and priority, MCP)技术,该技术将测试样本聚类到DL模型的多个边界的边界区域,并指定优先级从所有边界区域均匀地选择样本,以确保每次边界重建都有足够的有用样本。为了评估MCP,我们对三种流行的深度学习模型和33种模拟测试环境进行了广泛的实证研究。实验结果表明,与最先进的基线方法相比,在有效性方面,通过评估再训练DL模型的改进质量,我们的方法MCP具有明显更好的性能;在效率上,MCP在时间成本上也有优势。
{"title":"Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining","authors":"Weijun Shen, Yanhui Li, Lin Chen, Yuanlei Han, Yuming Zhou, Baowen Xu","doi":"10.1145/3324884.3416621","DOIUrl":"https://doi.org/10.1145/3324884.3416621","url":null,"abstract":"With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117156871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhou, Haoyu Wang, Yajin Zhou, Xiapu Luo, Yutian Tang, Lei Xue, Ting Wang
Smartphone vendors are using multiple methods to kill processes of Android apps to reduce the battery consumption. This motivates developers to find ways to extend the liveness time of their apps, hence the name diehard apps in this paper. Although there are blogs and articles illustrating methods to achieve this purpose, there is no systematic research about them. What's more important, little is known about the prevalence of diehard apps in the wild. In this paper, we take a first step to systematically investigate diehard apps by answering the following research questions. First, why and how can they circumvent the resource-saving mechanisms of Android? Second, how prevalent are they in the wild? In particular, we conduct a semi -automated analysis to illustrate insights why existing methods to kill app processes could be evaded, and then systematically present 12 diehard methods. After that, we develop a system named DiehardDetector to detect diehard apps in a large scale. The experimental result of applying DiehardDetector to more than 80k Android apps downloaded from Google Play showed that around 21 % of apps adopt various diehard methods. Moreover, our system can achieve high precision and recall.
{"title":"Demystifying Diehard Android Apps","authors":"Hao Zhou, Haoyu Wang, Yajin Zhou, Xiapu Luo, Yutian Tang, Lei Xue, Ting Wang","doi":"10.1145/3324884.3416637","DOIUrl":"https://doi.org/10.1145/3324884.3416637","url":null,"abstract":"Smartphone vendors are using multiple methods to kill processes of Android apps to reduce the battery consumption. This motivates developers to find ways to extend the liveness time of their apps, hence the name diehard apps in this paper. Although there are blogs and articles illustrating methods to achieve this purpose, there is no systematic research about them. What's more important, little is known about the prevalence of diehard apps in the wild. In this paper, we take a first step to systematically investigate diehard apps by answering the following research questions. First, why and how can they circumvent the resource-saving mechanisms of Android? Second, how prevalent are they in the wild? In particular, we conduct a semi -automated analysis to illustrate insights why existing methods to kill app processes could be evaded, and then systematically present 12 diehard methods. After that, we develop a system named DiehardDetector to detect diehard apps in a large scale. The experimental result of applying DiehardDetector to more than 80k Android apps downloaded from Google Play showed that around 21 % of apps adopt various diehard methods. Moreover, our system can achieve high precision and recall.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115262844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reactive synthesis is an automated procedure to obtain a correct-by-construction reactive system from its temporal logic specification. GR(1) is an expressive assume-guarantee fragment of LTL that enables efficient synthesis and has been recently used in different contexts and application domains. In this work we present just-in-time synthesis (JITS) for GR(1), a novel means to execute synthesized reactive systems. Rather than constructing a controller at synthesis time, we compute next states during system execution, and only when they are required. We prove that JITS does not compromise the correctness of the synthesized system execution. We further show that the basic algorithm can be extended to enable several variants. We have implemented JITS in the Spectra synthesizer. Our evaluation, comparing JITS to existing tools over known benchmark specifications, shows that JITS reduces (1) total synthesis time, (2) the size of the synthesis output, and (3) the loading time for system execution, all while having little to no effect on system execution performance.
{"title":"Just-In-Time Reactive Synthesis","authors":"S. Maoz, Ilia Shevrin","doi":"10.1145/3324884.3416557","DOIUrl":"https://doi.org/10.1145/3324884.3416557","url":null,"abstract":"Reactive synthesis is an automated procedure to obtain a correct-by-construction reactive system from its temporal logic specification. GR(1) is an expressive assume-guarantee fragment of LTL that enables efficient synthesis and has been recently used in different contexts and application domains. In this work we present just-in-time synthesis (JITS) for GR(1), a novel means to execute synthesized reactive systems. Rather than constructing a controller at synthesis time, we compute next states during system execution, and only when they are required. We prove that JITS does not compromise the correctness of the synthesized system execution. We further show that the basic algorithm can be extended to enable several variants. We have implemented JITS in the Spectra synthesizer. Our evaluation, comparing JITS to existing tools over known benchmark specifications, shows that JITS reduces (1) total synthesis time, (2) the size of the synthesis output, and (3) the loading time for system execution, all while having little to no effect on system execution performance.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126054524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code context models consist of source code elements and their relations relevant to a development task. Prior research showed that making code context models explicit in software tools can benefit software development practices, e.g., code navigation and searching. However, little focus has been put on how to proactively form code context models. In this paper, we explore the proactive formation of code context models based on the topological patterns of code elements from interaction histories for a project. Specifically, we first learn abstract topological patterns based on the stereotype roles of code elements, rather than on specific code elements; we then leverage the learned patterns to predict the code context models for a given task by graph pattern matching. To determine the effectiveness of this approach, we applied the approach to interaction histories stored for the Eclipse Mylyn open source project. We found that our approach achieves maximum F-measures of 0.67, 0.33 and 0.21 for 1-step, 2-step and 3-step predictions, respectively. The most similar approach to ours is Suade, which supports 1-step prediction only. In comparison to this existing work, our approach predicts code context models with significantly higher F-measure (0.57 over 0.23 on average). The results demonstrate the value of integrating historical and structural approaches to form more accurate code context models.CCS CONCEPTS • Software and its engineering → Development frameworks and environments; • Human-centered computing → Human computer interaction (HCI).
{"title":"Predicting Code Context Models for Software Development Tasks","authors":"Zhiyuan Wan, G. Murphy, Xin Xia","doi":"10.1145/3324884.3416544","DOIUrl":"https://doi.org/10.1145/3324884.3416544","url":null,"abstract":"Code context models consist of source code elements and their relations relevant to a development task. Prior research showed that making code context models explicit in software tools can benefit software development practices, e.g., code navigation and searching. However, little focus has been put on how to proactively form code context models. In this paper, we explore the proactive formation of code context models based on the topological patterns of code elements from interaction histories for a project. Specifically, we first learn abstract topological patterns based on the stereotype roles of code elements, rather than on specific code elements; we then leverage the learned patterns to predict the code context models for a given task by graph pattern matching. To determine the effectiveness of this approach, we applied the approach to interaction histories stored for the Eclipse Mylyn open source project. We found that our approach achieves maximum F-measures of 0.67, 0.33 and 0.21 for 1-step, 2-step and 3-step predictions, respectively. The most similar approach to ours is Suade, which supports 1-step prediction only. In comparison to this existing work, our approach predicts code context models with significantly higher F-measure (0.57 over 0.23 on average). The results demonstrate the value of integrating historical and structural approaches to form more accurate code context models.CCS CONCEPTS • Software and its engineering → Development frameworks and environments; • Human-centered computing → Human computer interaction (HCI).","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124089510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers write logging statements to generate logs and record system execution behaviors to assist in debugging and software maintenance. However, deciding where to insert logging statements is a crucial yet challenging task. On one hand, logging too little may increase the maintenance difficulty due to missing important system execution information. On the other hand, logging too much may introduce excessive logs that mask the real problems and cause significant performance overhead. Prior studies provide recommendations on logging locations, but such recommendations are only for limited situations (e.g., exception logging) or at a coarse-grained level (e.g., method level). Thus, properly helping developers decide finer-grained logging locations for different situations remains an unsolved challenge. In this paper, we tackle the challenge by first conducting a comprehensive manual study on the characteristics of logging locations in seven open-source systems. We uncover six categories of logging locations and find that developers usually insert logging statements to record execution information in various types of code blocks. Based on the observed patterns, we then propose a deep learning framework to automatically suggest logging locations at the block level. We model the source code at the code block level using the syntactic and semantic information. We find that: 1) our models achieve an average of 80.1% balanced accuracy when suggesting logging locations in blocks; 2) our cross-system logging suggestion results reveal that there might be an implicit logging guideline across systems. Our results show that we may accurately provide finer-grained suggestions on logging locations, and such suggestions may be shared across systems.
{"title":"Where Shall We Log? Studying and Suggesting Logging Locations in Code Blocks","authors":"Zhenhao Li, T. Chen, Weiyi Shang","doi":"10.1145/3324884.3416636","DOIUrl":"https://doi.org/10.1145/3324884.3416636","url":null,"abstract":"Developers write logging statements to generate logs and record system execution behaviors to assist in debugging and software maintenance. However, deciding where to insert logging statements is a crucial yet challenging task. On one hand, logging too little may increase the maintenance difficulty due to missing important system execution information. On the other hand, logging too much may introduce excessive logs that mask the real problems and cause significant performance overhead. Prior studies provide recommendations on logging locations, but such recommendations are only for limited situations (e.g., exception logging) or at a coarse-grained level (e.g., method level). Thus, properly helping developers decide finer-grained logging locations for different situations remains an unsolved challenge. In this paper, we tackle the challenge by first conducting a comprehensive manual study on the characteristics of logging locations in seven open-source systems. We uncover six categories of logging locations and find that developers usually insert logging statements to record execution information in various types of code blocks. Based on the observed patterns, we then propose a deep learning framework to automatically suggest logging locations at the block level. We model the source code at the code block level using the syntactic and semantic information. We find that: 1) our models achieve an average of 80.1% balanced accuracy when suggesting logging locations in blocks; 2) our cross-system logging suggestion results reveal that there might be an implicit logging guideline across systems. Our results show that we may accurately provide finer-grained suggestions on logging locations, and such suggestions may be shared across systems.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu
The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep-learning methods to this task-specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CODENN, DEEPCOM, FUNCOM, and Docstring. We compare them with WMT19, a standard dataset frequently used to train state-of-the-art natural language translators. We found some interesting differences between the code-comment data and the WMT19 natural language data. Next, we describe and conduct some studies to calibrate BLEU (which is commonly used as a measure of comment quality). using “affinity pairs” of methods, from different projects, in the same project, in the same class, etc; Our study suggests that the current performance on some datasets might need to be improved substantially. We also argue that fairly naive information retrieval (IR) methods do well enough at this task to be considered a reasonable baseline. Finally, we make some suggestions on how our findings might be used in future research in this area.
{"title":"Code to Comment “Translation”: Data, Metrics, Baselining & Evaluation","authors":"David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu","doi":"10.1145/3324884.3416546","DOIUrl":"https://doi.org/10.1145/3324884.3416546","url":null,"abstract":"The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep-learning methods to this task-specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CODENN, DEEPCOM, FUNCOM, and Docstring. We compare them with WMT19, a standard dataset frequently used to train state-of-the-art natural language translators. We found some interesting differences between the code-comment data and the WMT19 natural language data. Next, we describe and conduct some studies to calibrate BLEU (which is commonly used as a measure of comment quality). using “affinity pairs” of methods, from different projects, in the same project, in the same class, etc; Our study suggests that the current performance on some datasets might need to be improved substantially. We also argue that fairly naive information retrieval (IR) methods do well enough at this task to be considered a reasonable baseline. Finally, we make some suggestions on how our findings might be used in future research in this area.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131230673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiqun T. Chen, Rahul Gopinath, Anita Tadakamalla, Michael D. Ernst, Reid Holmes, G. Fraser, P. Ammann, René Just
The research community has long recognized a complex interrelationship between fault detection, test adequacy criteria, and test set size. However, there is substantial confusion about whether and how to experimentally control for test set size when assessing how well an adequacy criterion is correlated with fault detection and when comparing test adequacy criteria. Resolving the confusion, this paper makes the following contributions: (1) A review of contradictory analyses of the relationships between fault detection, test adequacy criteria, and test set size. Specifically, this paper addresses the supposed contradiction of prior work and explains why test set size is neither a confounding variable, as previously suggested, nor an independent variable that should be experimentally manipulated. (2) An explication and discussion of the experimental designs of prior work, together with a discussion of conceptual and statistical problems, as well as specific guidelines for future work. (3) A methodology for comparing test adequacy criteria on an equal basis, which accounts for test set size without directly manipulating it through unrealistic stratification. (4) An empirical evaluation that compares the effectiveness of coverage-based testing, mutation-based testing, and random testing. Additionally, this paper proposes probabilistic coupling, a methodology for assessing the representativeness of a set of test goals for a given fault and for approximating the fault-detection probability of adequate test sets.
{"title":"Revisiting the Relationship Between Fault Detection, Test Adequacy Criteria, and Test Set Size","authors":"Yiqun T. Chen, Rahul Gopinath, Anita Tadakamalla, Michael D. Ernst, Reid Holmes, G. Fraser, P. Ammann, René Just","doi":"10.1145/3324884.3416667","DOIUrl":"https://doi.org/10.1145/3324884.3416667","url":null,"abstract":"The research community has long recognized a complex interrelationship between fault detection, test adequacy criteria, and test set size. However, there is substantial confusion about whether and how to experimentally control for test set size when assessing how well an adequacy criterion is correlated with fault detection and when comparing test adequacy criteria. Resolving the confusion, this paper makes the following contributions: (1) A review of contradictory analyses of the relationships between fault detection, test adequacy criteria, and test set size. Specifically, this paper addresses the supposed contradiction of prior work and explains why test set size is neither a confounding variable, as previously suggested, nor an independent variable that should be experimentally manipulated. (2) An explication and discussion of the experimental designs of prior work, together with a discussion of conceptual and statistical problems, as well as specific guidelines for future work. (3) A methodology for comparing test adequacy criteria on an equal basis, which accounts for test set size without directly manipulating it through unrealistic stratification. (4) An empirical evaluation that compares the effectiveness of coverage-based testing, mutation-based testing, and random testing. Additionally, this paper proposes probabilistic coupling, a methodology for assessing the representativeness of a set of test goals for a given fault and for approximating the fault-detection probability of adequate test sets.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123963574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted (and given up) by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions, and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption?In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the team-level determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions. In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the team-level determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions.
{"title":"Team Discussions and Dynamics During DevOps Tool Adoptions in OSS Projects","authors":"Likang Yin, V. Filkov","doi":"10.1145/3324884.3416640","DOIUrl":"https://doi.org/10.1145/3324884.3416640","url":null,"abstract":"In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted (and given up) by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions, and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption?In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the team-level determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions. In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the team-level determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"359 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124527024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents a description of a system for the automatic generation of predictive diagnostic models of CNC machine tools. This system allows machine tool maintenance specialists to select and operate models based on LSTM neural networks to determine the state of elements of CNC machines. Examples of changes in the accuracy of the models used during operation are given to determine the state of the cutting tool (more than 95%) and the bearings of electric motors (more than 91%).
{"title":"A Machine Learning based Approach to Autogenerate Diagnostic Models for CNC machines","authors":"K. Masalimov","doi":"10.1145/3324884.3418915","DOIUrl":"https://doi.org/10.1145/3324884.3418915","url":null,"abstract":"This article presents a description of a system for the automatic generation of predictive diagnostic models of CNC machine tools. This system allows machine tool maintenance specialists to select and operate models based on LSTM neural networks to determine the state of elements of CNC machines. Examples of changes in the accuracy of the models used during operation are given to determine the state of the cutting tool (more than 95%) and the bearings of electric motors (more than 91%).","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127916607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}