Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users. The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.
{"title":"Automatic library categorization","authors":"Camilo Velázquez-Rodríguez, Coen De Roover","doi":"10.1145/3387940.3392186","DOIUrl":"https://doi.org/10.1145/3387940.3392186","url":null,"abstract":"Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users. The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130951541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers often face a dilemma: to seek assistance from a colleague or to expend effort to answer a question herself. On one hand, seeking help is fast and reliable. But on the other, seeking assistance can distract colleagues and reduce their productivity. In this paper, we report our preliminary findings of assistance-seeking from an observational study at a medium-sized software company. We found that developers have varying levels of spoken communication when seeking help. We believe this is correlated with their different years of experience working as developers, among other factors. We also found that many employees would avoid asking for help several times due to various work-related and reported personal reasons. This has driven us to explore a new, exciting research area discovering the complexities of developers seeking help. This paper is our first analysis of this kind, and we hope to receive the community's feedback before continued work.
{"title":"An Exploratory Field Study of Programmer Assistance-Seeking during Software Development","authors":"Paige Rodeghero","doi":"10.1145/3387940.3392237","DOIUrl":"https://doi.org/10.1145/3387940.3392237","url":null,"abstract":"Developers often face a dilemma: to seek assistance from a colleague or to expend effort to answer a question herself. On one hand, seeking help is fast and reliable. But on the other, seeking assistance can distract colleagues and reduce their productivity. In this paper, we report our preliminary findings of assistance-seeking from an observational study at a medium-sized software company. We found that developers have varying levels of spoken communication when seeking help. We believe this is correlated with their different years of experience working as developers, among other factors. We also found that many employees would avoid asking for help several times due to various work-related and reported personal reasons. This has driven us to explore a new, exciting research area discovering the complexities of developers seeking help. This paper is our first analysis of this kind, and we hope to receive the community's feedback before continued work.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131440479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Owain Parry, G. M. Kapfhammer, Michael C Hilton, Phil McMinn
Since flaky tests pass or fail nondeterministically, without any code changes, they are an unreliable indicator of program quality. Developers may quarantine or delete flaky tests because it is often too time consuming to repair them. Yet, since decommissioning too many tests may ultimately degrade a test suite's effectiveness, developers may eventually want to fix them, a process that is challenging because the nondeterminism may have been introduced previously. We contend that the best time to discover and repair a flaky test is when a developer first creates and best understands it. We refer to tests that are not currently flaky, but that could become so, as having latent flakiness. We further argue that efforts to expose and repair latent flakiness are valuable in ensuring the future-reliability of the test suite, and that the testing cost is greater if latent flakiness is left to manifest itself later. Using concrete examples from a real-world program, this paper posits that automated program repair techniques will prove useful for surfacing latent flakiness.
{"title":"Flake It 'Till You Make It: Using Automated Repair to Induce and Fix Latent Test Flakiness","authors":"Owain Parry, G. M. Kapfhammer, Michael C Hilton, Phil McMinn","doi":"10.1145/3387940.3392177","DOIUrl":"https://doi.org/10.1145/3387940.3392177","url":null,"abstract":"Since flaky tests pass or fail nondeterministically, without any code changes, they are an unreliable indicator of program quality. Developers may quarantine or delete flaky tests because it is often too time consuming to repair them. Yet, since decommissioning too many tests may ultimately degrade a test suite's effectiveness, developers may eventually want to fix them, a process that is challenging because the nondeterminism may have been introduced previously. We contend that the best time to discover and repair a flaky test is when a developer first creates and best understands it. We refer to tests that are not currently flaky, but that could become so, as having latent flakiness. We further argue that efforts to expose and repair latent flakiness are valuable in ensuring the future-reliability of the test suite, and that the testing cost is greater if latent flakiness is left to manifest itself later. Using concrete examples from a real-world program, this paper posits that automated program repair techniques will prove useful for surfacing latent flakiness.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124745814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper aims to identify and model relationships between cryptocurrencies market price changes and topic discussion occurrences on social media. The considered cryptocurrencies are the two highest in value at the moment, Bitcoin and Ethereum. At the same time, topics were realized through a classification of the comments gained from the Reddit social media platform, implementing a Hawkes model. The results highlight that it is possible to identify some interactions among the considered features, and it appears that some topics are indicative of certain types of price movements. Specifically, the discussions concerning issues about government, trading and Ethereum cryptocurrency as an exchange currency, appear to affect Bitcoin and Ethereum prices negatively. The discussions of investment appear to be indicative of price rises, while the discussions related to new decentralized realities and technological applications is indicative of price falls.
{"title":"Investigation of Mutual-Influence among Blockchain Development Communities and Cryptocurrency Price Changes","authors":"Nicola Uras, Stefano Vacca, Giuseppe Destefanis","doi":"10.1145/3387940.3392245","DOIUrl":"https://doi.org/10.1145/3387940.3392245","url":null,"abstract":"This paper aims to identify and model relationships between cryptocurrencies market price changes and topic discussion occurrences on social media. The considered cryptocurrencies are the two highest in value at the moment, Bitcoin and Ethereum. At the same time, topics were realized through a classification of the comments gained from the Reddit social media platform, implementing a Hawkes model. The results highlight that it is possible to identify some interactions among the considered features, and it appears that some topics are indicative of certain types of price movements. Specifically, the discussions concerning issues about government, trading and Ethereum cryptocurrency as an exchange currency, appear to affect Bitcoin and Ethereum prices negatively. The discussions of investment appear to be indicative of price rises, while the discussions related to new decentralized realities and technological applications is indicative of price falls.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127705588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.
{"title":"Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification","authors":"E. M. Kavuk, Ayse Tosun Misirli","doi":"10.1145/3387940.3391491","DOIUrl":"https://doi.org/10.1145/3387940.3391491","url":null,"abstract":"This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using the W3C PROV data model, we present a general provenance model for software development processes and---as an example---specialized models for git services, for which we generate provenance graphs. Provenance graphs are knowledge graphs, since they have defined semantics, and can be analyzed with graph algorithms or semantic reasoning to get insights into processes.
{"title":"Modelling Knowledge about Software Processes using Provenance Graphs and its Application to Git-based Version Control Systems","authors":"A. Schreiber, C. D. Boer","doi":"10.1145/3387940.3392220","DOIUrl":"https://doi.org/10.1145/3387940.3392220","url":null,"abstract":"Using the W3C PROV data model, we present a general provenance model for software development processes and---as an example---specialized models for git services, for which we generate provenance graphs. Provenance graphs are knowledge graphs, since they have defined semantics, and can be analyzed with graph algorithms or semantic reasoning to get insights into processes.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130903791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces Mutamorphic Relation for Machine Learning Testing. Mutamorphic Relation combines data mutation and metamorphic relations as test oracles for machine learning systems. These oracles can help achieve fully automatic testing as well as automatic repair of the machine learning models. The paper takes TransRepair as an example to show the effectiveness of Mutamorphic Relation in automatically testing and improving machine translators, TransRepair detects inconsistency bugs without access to human oracles. It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsistencies. Manual inspection indicates that the translations repaired by TransRepair improve consistency in 87% of cases (degrading it in 2%), and that the repairs of have better translation acceptability in 27% of the cases (worse in 8%).
{"title":"Automatic Improvement of Machine Translation Using Mutamorphic Relation: Invited Talk Paper","authors":"Jie M. Zhang","doi":"10.1145/3387940.3391541","DOIUrl":"https://doi.org/10.1145/3387940.3391541","url":null,"abstract":"This paper introduces Mutamorphic Relation for Machine Learning Testing. Mutamorphic Relation combines data mutation and metamorphic relations as test oracles for machine learning systems. These oracles can help achieve fully automatic testing as well as automatic repair of the machine learning models. The paper takes TransRepair as an example to show the effectiveness of Mutamorphic Relation in automatically testing and improving machine translators, TransRepair detects inconsistency bugs without access to human oracles. It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsistencies. Manual inspection indicates that the translations repaired by TransRepair improve consistency in 87% of cases (degrading it in 2%), and that the repairs of have better translation acceptability in 27% of the cases (worse in 8%).","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Dekhtyar, Bruno Carreiro da Silva, Karson Slocum
In college coursework, we take care to educate future professional software engineers on how software development process works. Computer Science and Software Engineering students across the globe study software process models, gather requirements, design, implement and test their software, work on software maintenance, learn to submit bug reports, build project roadmaps, construct UML diagrams, and deploy software. Yet, ever since the emergence of consumer-facing software, software development often is a collaboration between professional software engineers and multiple stakeholders whose education, professional expertise, and general experience lie outside of computing. We teach future software engineers how to develop software. Why don't we do the same with other future stakeholders? This paper is a description of a pilot Software Engineering Without Programming course developed and taught at our university for the first time in 2020. In this early stage report (the course is ongoing as of the submisison deadline, but will have been completed by the time of the workshop) we outline the need for the course, its learning objectives, its organization, and the expected results.
{"title":"Educating Project Stakeholders: A Preliminary Report","authors":"Alex Dekhtyar, Bruno Carreiro da Silva, Karson Slocum","doi":"10.1145/3387940.3392164","DOIUrl":"https://doi.org/10.1145/3387940.3392164","url":null,"abstract":"In college coursework, we take care to educate future professional software engineers on how software development process works. Computer Science and Software Engineering students across the globe study software process models, gather requirements, design, implement and test their software, work on software maintenance, learn to submit bug reports, build project roadmaps, construct UML diagrams, and deploy software. Yet, ever since the emergence of consumer-facing software, software development often is a collaboration between professional software engineers and multiple stakeholders whose education, professional expertise, and general experience lie outside of computing. We teach future software engineers how to develop software. Why don't we do the same with other future stakeholders? This paper is a description of a pilot Software Engineering Without Programming course developed and taught at our university for the first time in 2020. In this early stage report (the course is ongoing as of the submisison deadline, but will have been completed by the time of the workshop) we outline the need for the course, its learning objectives, its organization, and the expected results.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133458833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miller Trujillo, M. Linares-Vásquez, Camilo Escobar-Velásquez, Ivana Dusparic, Nicolás Cardozo
Deep Learning (DL) is powerful family of algorithms used for a wide variety of problems and systems, including safety critical systems. As a consequence, analyzing, understanding, and testing DL models is attracting more practitioners and researchers with the purpose of implementing DL systems that are robust, reliable, efficient, and accurate. First software testing approaches for DL systems have focused on black-box testing, white-box testing, and test cases generation, in particular for deep neural networks (CNNs and RNNs). However, Deep Reinforcement Learning (DRL), which is a branch of DL extending reinforcement learning, is still out of the scope of research providing testing techniques for DL systems. In this paper, we present a first step towards testing of DRL systems. In particular, we investigate whether neuron coverage (a widely used metric for white-box testing of DNNs) could be used also for DRL systems, by analyzing coverage evolutionary patterns, and the correlation with RL rewards.
{"title":"Does Neuron Coverage Matter for Deep Reinforcement Learning?: A Preliminary Study","authors":"Miller Trujillo, M. Linares-Vásquez, Camilo Escobar-Velásquez, Ivana Dusparic, Nicolás Cardozo","doi":"10.1145/3387940.3391462","DOIUrl":"https://doi.org/10.1145/3387940.3391462","url":null,"abstract":"Deep Learning (DL) is powerful family of algorithms used for a wide variety of problems and systems, including safety critical systems. As a consequence, analyzing, understanding, and testing DL models is attracting more practitioners and researchers with the purpose of implementing DL systems that are robust, reliable, efficient, and accurate. First software testing approaches for DL systems have focused on black-box testing, white-box testing, and test cases generation, in particular for deep neural networks (CNNs and RNNs). However, Deep Reinforcement Learning (DRL), which is a branch of DL extending reinforcement learning, is still out of the scope of research providing testing techniques for DL systems. In this paper, we present a first step towards testing of DRL systems. In particular, we investigate whether neuron coverage (a widely used metric for white-box testing of DNNs) could be used also for DRL systems, by analyzing coverage evolutionary patterns, and the correlation with RL rewards.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131612657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software robots, or bots, are useful for automating a wide variety of programming and software development tasks. Despite the advantages of using bots throughout the software engineering process, research shows that developers often face challenges interacting with these systems. To improve automated developer recommendations from bots, this work introduces developer recommendation choice architectures. Choice architecture is a behavioral science concept that suggests the presentation of options impacts the decisions humans make. To evaluate the impact of framing recommendations for software engineers, we examine the impact of one choice architecture, actionability, for improving the design of bot recommendations. We present the results of a preliminary study evaluating this choice architecture in a bot and provide implications for integrating choice architecture into the design of future software engineering bots.
{"title":"Sorry to Bother You Again: Developer Recommendation Choice Architectures for Designing Effective Bots","authors":"Chris Brown, Chris Parnin","doi":"10.1145/3387940.3391506","DOIUrl":"https://doi.org/10.1145/3387940.3391506","url":null,"abstract":"Software robots, or bots, are useful for automating a wide variety of programming and software development tasks. Despite the advantages of using bots throughout the software engineering process, research shows that developers often face challenges interacting with these systems. To improve automated developer recommendations from bots, this work introduces developer recommendation choice architectures. Choice architecture is a behavioral science concept that suggests the presentation of options impacts the decisions humans make. To evaluate the impact of framing recommendations for software engineers, we examine the impact of one choice architecture, actionability, for improving the design of bot recommendations. We present the results of a preliminary study evaluating this choice architecture in a bot and provide implications for integrating choice architecture into the design of future software engineering bots.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131732763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}