Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users. The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.
{"title":"Automatic library categorization","authors":"Camilo Velázquez-Rodríguez, Coen De Roover","doi":"10.1145/3387940.3392186","DOIUrl":"https://doi.org/10.1145/3387940.3392186","url":null,"abstract":"Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users. The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130951541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers often face a dilemma: to seek assistance from a colleague or to expend effort to answer a question herself. On one hand, seeking help is fast and reliable. But on the other, seeking assistance can distract colleagues and reduce their productivity. In this paper, we report our preliminary findings of assistance-seeking from an observational study at a medium-sized software company. We found that developers have varying levels of spoken communication when seeking help. We believe this is correlated with their different years of experience working as developers, among other factors. We also found that many employees would avoid asking for help several times due to various work-related and reported personal reasons. This has driven us to explore a new, exciting research area discovering the complexities of developers seeking help. This paper is our first analysis of this kind, and we hope to receive the community's feedback before continued work.
{"title":"An Exploratory Field Study of Programmer Assistance-Seeking during Software Development","authors":"Paige Rodeghero","doi":"10.1145/3387940.3392237","DOIUrl":"https://doi.org/10.1145/3387940.3392237","url":null,"abstract":"Developers often face a dilemma: to seek assistance from a colleague or to expend effort to answer a question herself. On one hand, seeking help is fast and reliable. But on the other, seeking assistance can distract colleagues and reduce their productivity. In this paper, we report our preliminary findings of assistance-seeking from an observational study at a medium-sized software company. We found that developers have varying levels of spoken communication when seeking help. We believe this is correlated with their different years of experience working as developers, among other factors. We also found that many employees would avoid asking for help several times due to various work-related and reported personal reasons. This has driven us to explore a new, exciting research area discovering the complexities of developers seeking help. This paper is our first analysis of this kind, and we hope to receive the community's feedback before continued work.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131440479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Owain Parry, G. M. Kapfhammer, Michael C Hilton, Phil McMinn
Since flaky tests pass or fail nondeterministically, without any code changes, they are an unreliable indicator of program quality. Developers may quarantine or delete flaky tests because it is often too time consuming to repair them. Yet, since decommissioning too many tests may ultimately degrade a test suite's effectiveness, developers may eventually want to fix them, a process that is challenging because the nondeterminism may have been introduced previously. We contend that the best time to discover and repair a flaky test is when a developer first creates and best understands it. We refer to tests that are not currently flaky, but that could become so, as having latent flakiness. We further argue that efforts to expose and repair latent flakiness are valuable in ensuring the future-reliability of the test suite, and that the testing cost is greater if latent flakiness is left to manifest itself later. Using concrete examples from a real-world program, this paper posits that automated program repair techniques will prove useful for surfacing latent flakiness.
{"title":"Flake It 'Till You Make It: Using Automated Repair to Induce and Fix Latent Test Flakiness","authors":"Owain Parry, G. M. Kapfhammer, Michael C Hilton, Phil McMinn","doi":"10.1145/3387940.3392177","DOIUrl":"https://doi.org/10.1145/3387940.3392177","url":null,"abstract":"Since flaky tests pass or fail nondeterministically, without any code changes, they are an unreliable indicator of program quality. Developers may quarantine or delete flaky tests because it is often too time consuming to repair them. Yet, since decommissioning too many tests may ultimately degrade a test suite's effectiveness, developers may eventually want to fix them, a process that is challenging because the nondeterminism may have been introduced previously. We contend that the best time to discover and repair a flaky test is when a developer first creates and best understands it. We refer to tests that are not currently flaky, but that could become so, as having latent flakiness. We further argue that efforts to expose and repair latent flakiness are valuable in ensuring the future-reliability of the test suite, and that the testing cost is greater if latent flakiness is left to manifest itself later. Using concrete examples from a real-world program, this paper posits that automated program repair techniques will prove useful for surfacing latent flakiness.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124745814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamically forming networks of cyber-physical systems are becoming increasingly widespread in manufacturing, transportation, automotive, avionics and more domains. The emergence of future internet technology and the ambition for ever closer integration of different systems leads to highly collaborative cyber-physical systems. Such cyber-physical systems form networks to provide additional functions, behavior, and benefits the individual systems cannot provide on their own. As safety is a major concern of systems from these domains, there is a need to provide adequate support for safety analyses of these collaborative cyber-physical systems. This support must explicitly consider the dynamically formed networks of cyber-physical systems. This is a challenging task as the configurations of these cyber-physical system networks (i.e. the architecture of the super system the individual system joins) can differ enormously depending on the actual systems joining a cyber-physical system network. Furthermore, the configuration of the network heavily impacts the adaptations performed by the individual systems and thereby impacting the architecture not only of the system network but of all individual systems involved. As existing safety analysis techniques, however, are not meant for supporting such an array of potential system network configurations the individual system will have to be able to cope with at runtime, we propose automated support for safety analysis for these systems that considers the configuration of the system network. Initial evaluation results from the application to industrial case examples show that the proposed support can aid in the detection of safety defects.
{"title":"Towards automated safety analysis for architectures of dynamically forming networks of cyber-physical systems","authors":"Jennifer Brings, Marian Daun","doi":"10.1145/3387940.3391474","DOIUrl":"https://doi.org/10.1145/3387940.3391474","url":null,"abstract":"Dynamically forming networks of cyber-physical systems are becoming increasingly widespread in manufacturing, transportation, automotive, avionics and more domains. The emergence of future internet technology and the ambition for ever closer integration of different systems leads to highly collaborative cyber-physical systems. Such cyber-physical systems form networks to provide additional functions, behavior, and benefits the individual systems cannot provide on their own. As safety is a major concern of systems from these domains, there is a need to provide adequate support for safety analyses of these collaborative cyber-physical systems. This support must explicitly consider the dynamically formed networks of cyber-physical systems. This is a challenging task as the configurations of these cyber-physical system networks (i.e. the architecture of the super system the individual system joins) can differ enormously depending on the actual systems joining a cyber-physical system network. Furthermore, the configuration of the network heavily impacts the adaptations performed by the individual systems and thereby impacting the architecture not only of the system network but of all individual systems involved. As existing safety analysis techniques, however, are not meant for supporting such an array of potential system network configurations the individual system will have to be able to cope with at runtime, we propose automated support for safety analysis for these systems that considers the configuration of the system network. Initial evaluation results from the application to industrial case examples show that the proposed support can aid in the detection of safety defects.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127453680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.
{"title":"Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification","authors":"E. M. Kavuk, Ayse Tosun Misirli","doi":"10.1145/3387940.3391491","DOIUrl":"https://doi.org/10.1145/3387940.3391491","url":null,"abstract":"This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Similarity analysis plays an important role in automated program repair by finding the correct solution earlier. However, the effectiveness of similarity is mostly validated using common benchmark Defects4J which consists of 6 large projects. To mitigate the threat of generalizability, this study examines the performance of similarity in repairing small programs. For this purpose, existing syntactic and semantic similarity based approaches, as well as a new technique of combining both similarities, are used. These approaches are evaluated using QuixBugs, a dataset of diverse type bugs from 40 small programs. These techniques fix bugs faster by validating fewer patches than random patch selection based approach. Thus, it proves the effectiveness of similarity in repairing small programs.
{"title":"Impact of Similarity on Repairing Small Programs: A Case Study on QuixBugs Benchmark","authors":"Moumita Asad, Kishan Kumar Ganguly, K. Sakib","doi":"10.1145/3387940.3392182","DOIUrl":"https://doi.org/10.1145/3387940.3392182","url":null,"abstract":"Similarity analysis plays an important role in automated program repair by finding the correct solution earlier. However, the effectiveness of similarity is mostly validated using common benchmark Defects4J which consists of 6 large projects. To mitigate the threat of generalizability, this study examines the performance of similarity in repairing small programs. For this purpose, existing syntactic and semantic similarity based approaches, as well as a new technique of combining both similarities, are used. These approaches are evaluated using QuixBugs, a dataset of diverse type bugs from 40 small programs. These techniques fix bugs faster by validating fewer patches than random patch selection based approach. Thus, it proves the effectiveness of similarity in repairing small programs.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129403353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of identifying critical components in large scale networked Cyber-Physical Systems comes up as an underlying issue when attempting to enhance the efficiency, the safety and the security of such systems. Graph theory is one of the well-studied methods that are often used to model complex systems and to facilitate the analysis of network-based features of systems to identify critical components. However, recent studies mainly focus on identifying influential nodes in a system and neglect the importance of links. In this paper, we heed to the identification of both key links and nodes in a system, and we aggregate the result by leveraging the multi-variable synthetic evaluation and multiple-criteria decision-making M-TOPSIS method to rank the system components based on their importance.
{"title":"Identifying Critical Components in Large Scale Cyber Physical Systems","authors":"Aida Akbarzadeh, S. Katsikas","doi":"10.1145/3387940.3391473","DOIUrl":"https://doi.org/10.1145/3387940.3391473","url":null,"abstract":"The problem of identifying critical components in large scale networked Cyber-Physical Systems comes up as an underlying issue when attempting to enhance the efficiency, the safety and the security of such systems. Graph theory is one of the well-studied methods that are often used to model complex systems and to facilitate the analysis of network-based features of systems to identify critical components. However, recent studies mainly focus on identifying influential nodes in a system and neglect the importance of links. In this paper, we heed to the identification of both key links and nodes in a system, and we aggregate the result by leveraging the multi-variable synthetic evaluation and multiple-criteria decision-making M-TOPSIS method to rank the system components based on their importance.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127646030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper aims to identify and model relationships between cryptocurrencies market price changes and topic discussion occurrences on social media. The considered cryptocurrencies are the two highest in value at the moment, Bitcoin and Ethereum. At the same time, topics were realized through a classification of the comments gained from the Reddit social media platform, implementing a Hawkes model. The results highlight that it is possible to identify some interactions among the considered features, and it appears that some topics are indicative of certain types of price movements. Specifically, the discussions concerning issues about government, trading and Ethereum cryptocurrency as an exchange currency, appear to affect Bitcoin and Ethereum prices negatively. The discussions of investment appear to be indicative of price rises, while the discussions related to new decentralized realities and technological applications is indicative of price falls.
{"title":"Investigation of Mutual-Influence among Blockchain Development Communities and Cryptocurrency Price Changes","authors":"Nicola Uras, Stefano Vacca, Giuseppe Destefanis","doi":"10.1145/3387940.3392245","DOIUrl":"https://doi.org/10.1145/3387940.3392245","url":null,"abstract":"This paper aims to identify and model relationships between cryptocurrencies market price changes and topic discussion occurrences on social media. The considered cryptocurrencies are the two highest in value at the moment, Bitcoin and Ethereum. At the same time, topics were realized through a classification of the comments gained from the Reddit social media platform, implementing a Hawkes model. The results highlight that it is possible to identify some interactions among the considered features, and it appears that some topics are indicative of certain types of price movements. Specifically, the discussions concerning issues about government, trading and Ethereum cryptocurrency as an exchange currency, appear to affect Bitcoin and Ethereum prices negatively. The discussions of investment appear to be indicative of price rises, while the discussions related to new decentralized realities and technological applications is indicative of price falls.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127705588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using the W3C PROV data model, we present a general provenance model for software development processes and---as an example---specialized models for git services, for which we generate provenance graphs. Provenance graphs are knowledge graphs, since they have defined semantics, and can be analyzed with graph algorithms or semantic reasoning to get insights into processes.
{"title":"Modelling Knowledge about Software Processes using Provenance Graphs and its Application to Git-based Version Control Systems","authors":"A. Schreiber, C. D. Boer","doi":"10.1145/3387940.3392220","DOIUrl":"https://doi.org/10.1145/3387940.3392220","url":null,"abstract":"Using the W3C PROV data model, we present a general provenance model for software development processes and---as an example---specialized models for git services, for which we generate provenance graphs. Provenance graphs are knowledge graphs, since they have defined semantics, and can be analyzed with graph algorithms or semantic reasoning to get insights into processes.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130903791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Dekhtyar, Bruno Carreiro da Silva, Karson Slocum
In college coursework, we take care to educate future professional software engineers on how software development process works. Computer Science and Software Engineering students across the globe study software process models, gather requirements, design, implement and test their software, work on software maintenance, learn to submit bug reports, build project roadmaps, construct UML diagrams, and deploy software. Yet, ever since the emergence of consumer-facing software, software development often is a collaboration between professional software engineers and multiple stakeholders whose education, professional expertise, and general experience lie outside of computing. We teach future software engineers how to develop software. Why don't we do the same with other future stakeholders? This paper is a description of a pilot Software Engineering Without Programming course developed and taught at our university for the first time in 2020. In this early stage report (the course is ongoing as of the submisison deadline, but will have been completed by the time of the workshop) we outline the need for the course, its learning objectives, its organization, and the expected results.
{"title":"Educating Project Stakeholders: A Preliminary Report","authors":"Alex Dekhtyar, Bruno Carreiro da Silva, Karson Slocum","doi":"10.1145/3387940.3392164","DOIUrl":"https://doi.org/10.1145/3387940.3392164","url":null,"abstract":"In college coursework, we take care to educate future professional software engineers on how software development process works. Computer Science and Software Engineering students across the globe study software process models, gather requirements, design, implement and test their software, work on software maintenance, learn to submit bug reports, build project roadmaps, construct UML diagrams, and deploy software. Yet, ever since the emergence of consumer-facing software, software development often is a collaboration between professional software engineers and multiple stakeholders whose education, professional expertise, and general experience lie outside of computing. We teach future software engineers how to develop software. Why don't we do the same with other future stakeholders? This paper is a description of a pilot Software Engineering Without Programming course developed and taught at our university for the first time in 2020. In this early stage report (the course is ongoing as of the submisison deadline, but will have been completed by the time of the workshop) we outline the need for the course, its learning objectives, its organization, and the expected results.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133458833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}