Proper permission controls in Android systems are important for protecting users' private data when running applications installed on the devices. Currently Android systems require apps to obtain authorization from users at the first time when they try to access users' sensitive data, but every permission is only managed at the application level, allowing apps to (mis)use permissions granted by users at the beginning for different purposes subsequently without informing users. Based on privacy-by-design principles, this paper develops a new permission manager, named UIPDroid, that (1) enforces the users' basic right-to-know through user interfaces whenever an app uses permissions, and (2) provides a more fine-grained UI widget-level permission control that can allow, deny, or produce fake private data dynamically for each permission use in the app at the choice of users, even if the permissions may have been granted to the app at the application level. In addition, to make the tool easier for end users to use, unlike some other root-based solutions, our solution is root-free, developed as a module on top of a virtualization framework that can be installed onto users' device as a usual app. Our preliminary evaluation results show that UIPDroid works well for fine-grained, per-widget control of contact and location permissions implemented in the prototype tool, improving users' privacy awareness and their protection. The tool is available at https://github.com/pangdingzhang/Anti-Beholder; A demo video is at: https://youtu.be/dT-mq4oasNU
{"title":"UIPDroid","authors":"Mulin Duan, Lingxiao Jiang, Lwin Khin Shar, Debin Gao","doi":"10.1145/3510454.3516844","DOIUrl":"https://doi.org/10.1145/3510454.3516844","url":null,"abstract":"Proper permission controls in Android systems are important for protecting users' private data when running applications installed on the devices. Currently Android systems require apps to obtain authorization from users at the first time when they try to access users' sensitive data, but every permission is only managed at the application level, allowing apps to (mis)use permissions granted by users at the beginning for different purposes subsequently without informing users. Based on privacy-by-design principles, this paper develops a new permission manager, named UIPDroid, that (1) enforces the users' basic right-to-know through user interfaces whenever an app uses permissions, and (2) provides a more fine-grained UI widget-level permission control that can allow, deny, or produce fake private data dynamically for each permission use in the app at the choice of users, even if the permissions may have been granted to the app at the application level. In addition, to make the tool easier for end users to use, unlike some other root-based solutions, our solution is root-free, developed as a module on top of a virtualization framework that can be installed onto users' device as a usual app. Our preliminary evaluation results show that UIPDroid works well for fine-grained, per-widget control of contact and location permissions implemented in the prototype tool, improving users' privacy awareness and their protection. The tool is available at https://github.com/pangdingzhang/Anti-Beholder; A demo video is at: https://youtu.be/dT-mq4oasNU","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128559548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Test suites and documentation capture similar information despite serving distinct purposes. Such redundancy introduces the risk that the artifacts inconsistently capture specifications. We present DScribe, an approach that leverages the redundant information in tests and documentation to reduce the cost of creating them and the threat of inconsistencies. DScribe allows developers to define simple templates that jointly capture the structure to test and document a specification. They can then use these templates to generate consistent and checkable tests and documentation. By linking documentation to unit tests, DScribe ensures documentation accuracy as outdated documentation is flagged by failing tests. DScribe’s template-based approach also enforces a uniform style throughout the artifacts. Hence, in addition to reducing developer effort, DScribe improves artifact quality by ensuring consistent content and style. Video: https://www.youtube.com/watch?v=CUKp3-MjMog
{"title":"DScribe","authors":"Alexa Hernandez, M. Nassif, M. Robillard","doi":"10.1145/3510454.3516856","DOIUrl":"https://doi.org/10.1145/3510454.3516856","url":null,"abstract":"Test suites and documentation capture similar information despite serving distinct purposes. Such redundancy introduces the risk that the artifacts inconsistently capture specifications. We present DScribe, an approach that leverages the redundant information in tests and documentation to reduce the cost of creating them and the threat of inconsistencies. DScribe allows developers to define simple templates that jointly capture the structure to test and document a specification. They can then use these templates to generate consistent and checkable tests and documentation. By linking documentation to unit tests, DScribe ensures documentation accuracy as outdated documentation is flagged by failing tests. DScribe’s template-based approach also enforces a uniform style throughout the artifacts. Hence, in addition to reducing developer effort, DScribe improves artifact quality by ensuring consistent content and style. Video: https://www.youtube.com/watch?v=CUKp3-MjMog","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133960585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Yan, Shixin Zhang, Yepang Liu, Jun Yan, Jian Zhang
{"title":"ICCBot","authors":"Jiwei Yan, Shixin Zhang, Yepang Liu, Jun Yan, Jian Zhang","doi":"10.1145/3510454.3516864","DOIUrl":"https://doi.org/10.1145/3510454.3516864","url":null,"abstract":"","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134258656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunkai Liang, Yun Lin, Xuezhi Song, Jun Sun, Zhiyong Feng, J. Dong
The development of deep learning programs, as a new programming paradigm, is observed to suffer from various defects. Emerging research works have been proposed to detect, debug, and repair deep learning bugs, which drive the need to construct the bug benchmarks. In this work, we present gDefects4DL, a dataset for general bugs of deep learning programs. Comparing to existing datasets, gDefects4DL collects bugs where the root causes and fix solutions can be well generalized to other projects. Our general bugs include deep learning program bugs such as (1) violation of deep learning API usage pattern (e.g., the standard to implement cross entropy function y•log(y), y → 0, without NaN error), (2) shape-mismatch of tensor calculation, (3) numeric bugs, (4) type-mismatch (e.g., confusing similar types among numpy, pytorch, and tensorflow), (5) violation of model architecture design convention, and (6) performance bugs.For each bug in gDefects4DL, we describe why it is general and group the bugs with similar root causes and fix solutions for reference. Moreover, gDefects4DL also maintains (1) its buggy/fixed versions and the isolated fix change, (2) an isolated environment to replicate the defect, and (3) the whole code evolution history from the buggy version to the fixed version. We design gDefects4DL with extensible interfaces to evaluate software engineering methodologies and tools. We have integrated tools such as ShapeFlow, DEBAR, and GRIST. gDefects4DL contains 64 bugs falling into 6 categories (i.e., API Misuse, Shape Mismatch, Number Error, Type Mismatch, Violation of Architecture Convention, and Performance Bug). gDefects4DL is available at https://github.com/llmhyy/defects4dl, its online web demonstration is at http://47.93.14.147:9000/bugList, and the demo video is at https://youtu.be/0XtaEt4Fhm4.
{"title":"gDefects4DL","authors":"Yunkai Liang, Yun Lin, Xuezhi Song, Jun Sun, Zhiyong Feng, J. Dong","doi":"10.1145/3510454.3516826","DOIUrl":"https://doi.org/10.1145/3510454.3516826","url":null,"abstract":"The development of deep learning programs, as a new programming paradigm, is observed to suffer from various defects. Emerging research works have been proposed to detect, debug, and repair deep learning bugs, which drive the need to construct the bug benchmarks. In this work, we present gDefects4DL, a dataset for general bugs of deep learning programs. Comparing to existing datasets, gDefects4DL collects bugs where the root causes and fix solutions can be well generalized to other projects. Our general bugs include deep learning program bugs such as (1) violation of deep learning API usage pattern (e.g., the standard to implement cross entropy function y•log(y), y → 0, without NaN error), (2) shape-mismatch of tensor calculation, (3) numeric bugs, (4) type-mismatch (e.g., confusing similar types among numpy, pytorch, and tensorflow), (5) violation of model architecture design convention, and (6) performance bugs.For each bug in gDefects4DL, we describe why it is general and group the bugs with similar root causes and fix solutions for reference. Moreover, gDefects4DL also maintains (1) its buggy/fixed versions and the isolated fix change, (2) an isolated environment to replicate the defect, and (3) the whole code evolution history from the buggy version to the fixed version. We design gDefects4DL with extensible interfaces to evaluate software engineering methodologies and tools. We have integrated tools such as ShapeFlow, DEBAR, and GRIST. gDefects4DL contains 64 bugs falling into 6 categories (i.e., API Misuse, Shape Mismatch, Number Error, Type Mismatch, Violation of Architecture Convention, and Performance Bug). gDefects4DL is available at https://github.com/llmhyy/defects4dl, its online web demonstration is at http://47.93.14.147:9000/bugList, and the demo video is at https://youtu.be/0XtaEt4Fhm4.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121885471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stack Overflow and GitHub are two popular platforms containing API-related resources for developers to learn how to use APIs. The platforms are good sources for information about API such as code examples, usages, sentiment, bug reports, etc. However, it is difficult to collect the correct resources regarding a particular API due to the ambiguity of an API method name. An API method name mentioned in the text would only refer to one API, but the method name could match with different APIs. To help people in finding the correct resources for a particular API, we introduce ARSearch. ARSearch finds Stack Overflow threads that mention the particular API and their relevant code examples from GitHub. We demonstrate our tool by a video available at https://youtu.be/Rr-zTfUD_z0.
{"title":"ARSearch","authors":"K. Luong, Ferdian Thung, D. lo","doi":"10.1145/3510454.3517048","DOIUrl":"https://doi.org/10.1145/3510454.3517048","url":null,"abstract":"Stack Overflow and GitHub are two popular platforms containing API-related resources for developers to learn how to use APIs. The platforms are good sources for information about API such as code examples, usages, sentiment, bug reports, etc. However, it is difficult to collect the correct resources regarding a particular API due to the ambiguity of an API method name. An API method name mentioned in the text would only refer to one API, but the method name could match with different APIs. To help people in finding the correct resources for a particular API, we introduce ARSearch. ARSearch finds Stack Overflow threads that mention the particular API and their relevant code examples from GitHub. We demonstrate our tool by a video available at https://youtu.be/Rr-zTfUD_z0.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122195987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyuan Wei, Haipeng Wang, Zhen Yang, William Chan
Deep learning (DL) models are widely used in software applications. Novel DL models and datasets are published from time to time. Developers may also tempt to apply new software engineering (SE) techniques on their DL models. However, no existing tool supports the applications of software testing and debugging techniques on new DL models and their datasets without modifying the code. Developers should manually write code to glue every combination of models, datasets, and SE technique and chain them together.We propose SEbox4DL, a novel and modular toolbox that automatically integrates models, datasets, and SE techniques into SE pipelines seen in developing DL models. SEbox4DL exemplifies six SE pipelines and can be extended with ease. Each user-defined task in the pipelines is to implement a SE technique within a function with a unified interface so that the whole design of SEbox4DL is generic, modular, and extensible. We have implemented several SE techniques as user-defined tasks to make SEbox4DL off-the-shelf. Our experiments demonstrate that SEbox4DL can simplify the applications of software testing and repair techniques on the latest or popular DL models and datasets. The toolbox is open-source and published at https://github.com/Wsine/SEbox4DL. A video for demonstration is available at: https://youtu.be/EYeFFi4lswc.
{"title":"SEbox4DL","authors":"Zhengyuan Wei, Haipeng Wang, Zhen Yang, William Chan","doi":"10.1145/3510454.3516828","DOIUrl":"https://doi.org/10.1145/3510454.3516828","url":null,"abstract":"Deep learning (DL) models are widely used in software applications. Novel DL models and datasets are published from time to time. Developers may also tempt to apply new software engineering (SE) techniques on their DL models. However, no existing tool supports the applications of software testing and debugging techniques on new DL models and their datasets without modifying the code. Developers should manually write code to glue every combination of models, datasets, and SE technique and chain them together.We propose SEbox4DL, a novel and modular toolbox that automatically integrates models, datasets, and SE techniques into SE pipelines seen in developing DL models. SEbox4DL exemplifies six SE pipelines and can be extended with ease. Each user-defined task in the pipelines is to implement a SE technique within a function with a unified interface so that the whole design of SEbox4DL is generic, modular, and extensible. We have implemented several SE techniques as user-defined tasks to make SEbox4DL off-the-shelf. Our experiments demonstrate that SEbox4DL can simplify the applications of software testing and repair techniques on the latest or popular DL models and datasets. The toolbox is open-source and published at https://github.com/Wsine/SEbox4DL. A video for demonstration is available at: https://youtu.be/EYeFFi4lswc.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"416 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115918115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers have to write thorough tests for their software in order to find bugs and to prevent regressions. Writing tests, however, is not every developer’s favourite occupation, and if a lack of motivation leads to a lack of tests, then this may have dire consequences, such as programs with poor quality or even project failures. This paper introduces Gamekins, a tool that uses gamification to motivate developers to write more and better tests. Gamekins is integrated into the Jenkins continuous integration platform where game elements are based on commits to the source code repository: Developers can earn points for completing test challenges and quests posed by Gamekins, compete with other developers or developer teams on a leaderboard, and are rewarded for their test-related achievements. A demo video of Gamekins is available at https://youtu.be/qnRWEQim12E; The tool, documentation, and source code are available at https://gamekins.org.
{"title":"Gamekins","authors":"Philipp Straubinger, G. Fraser","doi":"10.1145/3510454.3516862","DOIUrl":"https://doi.org/10.1145/3510454.3516862","url":null,"abstract":"Developers have to write thorough tests for their software in order to find bugs and to prevent regressions. Writing tests, however, is not every developer’s favourite occupation, and if a lack of motivation leads to a lack of tests, then this may have dire consequences, such as programs with poor quality or even project failures. This paper introduces Gamekins, a tool that uses gamification to motivate developers to write more and better tests. Gamekins is integrated into the Jenkins continuous integration platform where game elements are based on commits to the source code repository: Developers can earn points for completing test challenges and quests posed by Gamekins, compete with other developers or developer teams on a leaderboard, and are rewarded for their test-related achievements. A demo video of Gamekins is available at https://youtu.be/qnRWEQim12E; The tool, documentation, and source code are available at https://gamekins.org.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130949244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vuong Nguyen, Alessio Gambi, Jasim Ahmed, G. Fraser
Cyber-Physical Systems are increasingly deployed to perform safety-critical tasks, such as autonomously driving a vehicle. Therefore, thoroughly testing them is paramount to avoid accidents and fatalities. Driving simulators allow developers to address this challenge by testing autonomous vehicles in many driving scenarios; nevertheless, systematically generating scenarios that effectively stress the software controlling the vehicles remains an open challenge. Recent work has shown that effective test cases can be derived from simulations of critical driving scenarios such as car crashes. Hence, generating those simulations is a stepping stone for thoroughly testing autonomous vehicles. Towards this end, we propose CRISCE (CRItical SketChEs), an approach that leverages image processing (e.g., contour analysis) to automatically generate simulations of critical driving scenarios from accident sketches. Preliminary results show that CRISCE is efficient and can generate accurate simulations; hence, it has the potential to support developers in effectively achieving high-quality autonomous vehicles.
{"title":"CRISCE","authors":"Vuong Nguyen, Alessio Gambi, Jasim Ahmed, G. Fraser","doi":"10.1145/3510454.3528642","DOIUrl":"https://doi.org/10.1145/3510454.3528642","url":null,"abstract":"Cyber-Physical Systems are increasingly deployed to perform safety-critical tasks, such as autonomously driving a vehicle. Therefore, thoroughly testing them is paramount to avoid accidents and fatalities. Driving simulators allow developers to address this challenge by testing autonomous vehicles in many driving scenarios; nevertheless, systematically generating scenarios that effectively stress the software controlling the vehicles remains an open challenge. Recent work has shown that effective test cases can be derived from simulations of critical driving scenarios such as car crashes. Hence, generating those simulations is a stepping stone for thoroughly testing autonomous vehicles. Towards this end, we propose CRISCE (CRItical SketChEs), an approach that leverages image processing (e.g., contour analysis) to automatically generate simulations of critical driving scenarios from accident sketches. Preliminary results show that CRISCE is efficient and can generate accurate simulations; hence, it has the potential to support developers in effectively achieving high-quality autonomous vehicles.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122758800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present HUDD, a tool that supports safety analysis practices for systems enabled by Deep Neural Networks (DNNs) by automatically identifying the root causes for DNN errors and retraining the DNN. HUDD stands for Heatmap-based Unsupervised Debugging of DNNs, it automatically clusters error-inducing images whose results are due to common subsets of DNN neurons. The intent is for the generated clusters to group error-inducing images having common characteristics, that is, having a common root cause.HUDD identifies root causes by applying a clustering algorithm to matrices (i.e., heatmaps) capturing the relevance of every DNN neuron on the DNN outcome. Also, HUDD retrains DNNs with images that are automatically selected based on their relatedness to the identified image clusters. Our empirical evaluation with DNNs from the automotive domain have shown that HUDD automatically identifies all the distinct root causes of DNN errors, thus supporting safety analysis. Also, our retraining approach has shown to be more effective at improving DNN accuracy than existing approaches. A demo video of HUDD is available at https://youtu.be/drjVakP7jdU.
{"title":"HUDD","authors":"Hazem M. Fahmy, F. Pastore, Lionel C. Briand","doi":"10.1145/3510454.3516858","DOIUrl":"https://doi.org/10.1145/3510454.3516858","url":null,"abstract":"We present HUDD, a tool that supports safety analysis practices for systems enabled by Deep Neural Networks (DNNs) by automatically identifying the root causes for DNN errors and retraining the DNN. HUDD stands for Heatmap-based Unsupervised Debugging of DNNs, it automatically clusters error-inducing images whose results are due to common subsets of DNN neurons. The intent is for the generated clusters to group error-inducing images having common characteristics, that is, having a common root cause.HUDD identifies root causes by applying a clustering algorithm to matrices (i.e., heatmaps) capturing the relevance of every DNN neuron on the DNN outcome. Also, HUDD retrains DNNs with images that are automatically selected based on their relatedness to the identified image clusters. Our empirical evaluation with DNNs from the automotive domain have shown that HUDD automatically identifies all the distinct root causes of DNN errors, thus supporting safety analysis. Also, our retraining approach has shown to be more effective at improving DNN accuracy than existing approaches. A demo video of HUDD is available at https://youtu.be/drjVakP7jdU.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122859418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Testing deep learning libraries is ultimately important for ensuring the quality and safety of many deep learning applications. As differential testing is commonly used to help the creation of test oracles, its maintenance poses new challenges. In this tool demo paper, we present DiffWatch, a fully automated tool for Python, which identifies differential test practices in DLLs and continuously monitors new changes of external libraries that may trigger the updates of the identified differential tests.Our evaluation on four DLLs demonstrates that DiffWatch can detect differential testing with a high accuracy. In addition, we demonstrate usage examples to show DiffWatch’s capability of monitoring the development of external libraries and alert the maintainers of DLLs about new changes that may trigger the updates of differential test practices. In short, DiffWatch can help developers adequately react to the code evolution of external libraries. DiffWatch is publicly available and a demo video can be found at https://www.youtube.com/watch?v=gR7m5QQuSqE.
{"title":"DiffWatch","authors":"Alexander Prochnow, Jinqiu Yang","doi":"10.1145/3510454.3516835","DOIUrl":"https://doi.org/10.1145/3510454.3516835","url":null,"abstract":"Testing deep learning libraries is ultimately important for ensuring the quality and safety of many deep learning applications. As differential testing is commonly used to help the creation of test oracles, its maintenance poses new challenges. In this tool demo paper, we present DiffWatch, a fully automated tool for Python, which identifies differential test practices in DLLs and continuously monitors new changes of external libraries that may trigger the updates of the identified differential tests.Our evaluation on four DLLs demonstrates that DiffWatch can detect differential testing with a high accuracy. In addition, we demonstrate usage examples to show DiffWatch’s capability of monitoring the development of external libraries and alert the maintainers of DLLs about new changes that may trigger the updates of differential test practices. In short, DiffWatch can help developers adequately react to the code evolution of external libraries. DiffWatch is publicly available and a demo video can be found at https://www.youtube.com/watch?v=gR7m5QQuSqE.","PeriodicalId":326006,"journal":{"name":"Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123323468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}