Pub Date : 2023-02-21DOI: 10.1109/ICSE-NIER58687.2023.00017
Aniketh Malyala, K. Zhou, Baishakhi Ray, Saikat Chakraborty
With the advent of new and advanced programming languages, it becomes imperative to migrate legacy software to new programming languages. Unsupervised Machine Learning-based Program Translation could play an essential role in such migration, even without a sufficiently sizeable reliable corpus of parallel source code. However, these translators are far from perfect due to their statistical nature. This work investigates unsupervised program translators and where and why they fail. With in-depth error analysis of such failures, we have identified that the cases where such translators fail follow a few particular patterns. With this insight, we develop a rule-based program mutation engine, which pre-processes the input code if the input follows specific patterns and post-process the output if the output follows certain patterns. We show that our code processing tool, in conjunction with the program translator, can form a hybrid program translator and significantly improve the state-of-the-art. In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline using pre- and post-processing steps.
{"title":"On ML-Based Program Translation: Perils and Promises","authors":"Aniketh Malyala, K. Zhou, Baishakhi Ray, Saikat Chakraborty","doi":"10.1109/ICSE-NIER58687.2023.00017","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00017","url":null,"abstract":"With the advent of new and advanced programming languages, it becomes imperative to migrate legacy software to new programming languages. Unsupervised Machine Learning-based Program Translation could play an essential role in such migration, even without a sufficiently sizeable reliable corpus of parallel source code. However, these translators are far from perfect due to their statistical nature. This work investigates unsupervised program translators and where and why they fail. With in-depth error analysis of such failures, we have identified that the cases where such translators fail follow a few particular patterns. With this insight, we develop a rule-based program mutation engine, which pre-processes the input code if the input follows specific patterns and post-process the output if the output follows certain patterns. We show that our code processing tool, in conjunction with the program translator, can form a hybrid program translator and significantly improve the state-of-the-art. In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline using pre- and post-processing steps.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"11 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120913260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-12DOI: 10.1109/ICSE-NIER58687.2023.00020
Lee Martie, Jessie Rosenberg, Véronique Demers, Gaoyuan Zhang, Onkar Bhardwaj, John Henning, Aditya Prasad, Matt Stallone, Ja Young Lee, Lucy Yip, D. Adesina, Elahe Paikari, Oscar Resendiz, Sarah Shaw, David Cox
Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications, we have developed a novel framework called (Bee)* (written as a regular expression and pronounced as "beestar"). We illustrate how (Bee)* supports building integrated, scalable, and interactive compositional AI applications with a simplified developer experience.
{"title":"Rapid Development of Compositional AI","authors":"Lee Martie, Jessie Rosenberg, Véronique Demers, Gaoyuan Zhang, Onkar Bhardwaj, John Henning, Aditya Prasad, Matt Stallone, Ja Young Lee, Lucy Yip, D. Adesina, Elahe Paikari, Oscar Resendiz, Sarah Shaw, David Cox","doi":"10.1109/ICSE-NIER58687.2023.00020","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00020","url":null,"abstract":"Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications, we have developed a novel framework called (Bee)* (written as a regular expression and pronounced as \"beestar\"). We illustrate how (Bee)* supports building integrated, scalable, and interactive compositional AI applications with a simplified developer experience.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129422062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-11DOI: 10.1109/ICSE-NIER58687.2023.00022
Niroshinie Fernando, Chetan Arora, S. Loke, L. Alam, S. L. Macchia, Helen Graesser
Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for "crowd computing" (CC) has been studied in the recent past from an infrastructural feasibility perspective. However, for the CC paradigm to be successful, numerous socio-technical and software engineering (SE), specifically the requirements engineering (RE)-related factors are at play and have not been investigated in the literature. In this paper, we motivate the SE-related aspects of CC and the ideas for implementing mobile apps required for CC scenarios. We present the results of a preliminary study on understanding the human aspects, incentives that motivate users, and CC app requirements, and present our future development plan in this relatively new field of research for SE applications.
{"title":"Towards Human-Centred Crowd Computing: Software for Better Use of Computational Resources","authors":"Niroshinie Fernando, Chetan Arora, S. Loke, L. Alam, S. L. Macchia, Helen Graesser","doi":"10.1109/ICSE-NIER58687.2023.00022","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00022","url":null,"abstract":"Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for \"crowd computing\" (CC) has been studied in the recent past from an infrastructural feasibility perspective. However, for the CC paradigm to be successful, numerous socio-technical and software engineering (SE), specifically the requirements engineering (RE)-related factors are at play and have not been investigated in the literature. In this paper, we motivate the SE-related aspects of CC and the ideas for implementing mobile apps required for CC scenarios. We present the results of a preliminary study on understanding the human aspects, incentives that motivate users, and CC app requirements, and present our future development plan in this relatively new field of research for SE applications.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114286787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1109/ICSE-NIER58687.2023.00016
E. Kokinda, Makayla Moster, Paige Rodeghero
Much of software engineering research focuses on tools, algorithms, and optimization of software. Recently, we, as a community, have come to acknowledge that there is a gap in meta-research and addressing the human-factors in software engineering research. Through meta research, we aim to deepen our understanding of online participant recruitment and human-subjects software engineering research. In this paper we motivate the need to consider the unique challenges that human studies pose in software engineering research. We present several challenges faced by our research team in several distinct research studies, how they affected research, and motivate how, as researchers, we can address these challenges. We present results from a pilot study and categorize issues faced into three broad categories including participant recruitment, community engagement, and data poisoning. We further discuss how we can address these challenges and outline the benefits a full-study could provide to the software engineering research community.
{"title":"Under the Bridge: Trolling and the Challenges of Recruiting Software Developers for Empirical Research Studies","authors":"E. Kokinda, Makayla Moster, Paige Rodeghero","doi":"10.1109/ICSE-NIER58687.2023.00016","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00016","url":null,"abstract":"Much of software engineering research focuses on tools, algorithms, and optimization of software. Recently, we, as a community, have come to acknowledge that there is a gap in meta-research and addressing the human-factors in software engineering research. Through meta research, we aim to deepen our understanding of online participant recruitment and human-subjects software engineering research. In this paper we motivate the need to consider the unique challenges that human studies pose in software engineering research. We present several challenges faced by our research team in several distinct research studies, how they affected research, and motivate how, as researchers, we can address these challenges. We present results from a pilot study and categorize issues faced into three broad categories including participant recruitment, community engagement, and data poisoning. We further discuss how we can address these challenges and outline the benefits a full-study could provide to the software engineering research community.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121151798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-21DOI: 10.1109/ICSE-NIER58687.2023.00011
Lorenz Graf‐Vlachy
Background: Risk-taking is prevalent in a host of activities performed by software engineers on a daily basis, yet there is scant research on it. Aims and Method: We study if software engineers’ risk-taking is affected by framing effects and by software engineers’ personality. To this end, we perform a survey experiment with 124 software engineers. Results: We find that framing substantially affects their risk-taking. None of the "Big Five" personality traits are related to risk-taking in software engineers after correcting for multiple testing. Conclusions: Software engineers and their managers must be aware of framing effects and account for them properly.
{"title":"The Risk-Taking Software Engineer: A Framed Portrait","authors":"Lorenz Graf‐Vlachy","doi":"10.1109/ICSE-NIER58687.2023.00011","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00011","url":null,"abstract":"Background: Risk-taking is prevalent in a host of activities performed by software engineers on a daily basis, yet there is scant research on it. Aims and Method: We study if software engineers’ risk-taking is affected by framing effects and by software engineers’ personality. To this end, we perform a survey experiment with 124 software engineers. Results: We find that framing substantially affects their risk-taking. None of the \"Big Five\" personality traits are related to risk-taking in software engineers after correcting for multiple testing. Conclusions: Software engineers and their managers must be aware of framing effects and account for them properly.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122766689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-07DOI: 10.1109/ICSE-NIER58687.2023.00008
Meriem Ben Chaaben, Lola Burgueño, H. Sahraoui
We propose a simple yet a novel approach to improve completion in domain modeling activities. Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models with large datasets that are scarce in this field. We implemented our approach and tested it on the completion of static and dynamic domain diagrams. Our initial evaluation shows that such an approach is effective and can be integrated in different ways during the modeling activities.
{"title":"Towards using Few-Shot Prompt Learning for Automating Model Completion","authors":"Meriem Ben Chaaben, Lola Burgueño, H. Sahraoui","doi":"10.1109/ICSE-NIER58687.2023.00008","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00008","url":null,"abstract":"We propose a simple yet a novel approach to improve completion in domain modeling activities. Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models with large datasets that are scarce in this field. We implemented our approach and tested it on the completion of static and dynamic domain diagrams. Our initial evaluation shows that such an approach is effective and can be integrated in different ways during the modeling activities.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133023105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-11DOI: 10.1109/ICSE-NIER58687.2023.00007
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon
Distribution shift has been a longstanding challenge for the reliable deployment of deep learning (DL) models due to unexpected accuracy degradation. Although DL has been becoming a driving force for large-scale source code analysis in the big code era, limited progress has been made on distribution shift analysis and benchmarking for source code tasks. To fill this gap, this paper initiates to propose CodeS, a distribution shift benchmark dataset, for source code learning. Specifically, CodeS supports two programming languages (Java and Python) and five shift types (task, programmer, time-stamp, token, and concrete syntax tree). Extensive experiments based on CodeS reveal that 1) out-of-distribution detectors from other domains (e.g., computer vision) do not generalize to source code, 2) all code classification models suffer from distribution shifts, 3) representation-based shifts have a higher impact on the model than others, and 4) pretrained bimodal models are relatively more resistant to distribution shifts.
{"title":"CodeS: Towards Code Model Generalization Under Distribution Shift","authors":"Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon","doi":"10.1109/ICSE-NIER58687.2023.00007","DOIUrl":"https://doi.org/10.1109/ICSE-NIER58687.2023.00007","url":null,"abstract":"Distribution shift has been a longstanding challenge for the reliable deployment of deep learning (DL) models due to unexpected accuracy degradation. Although DL has been becoming a driving force for large-scale source code analysis in the big code era, limited progress has been made on distribution shift analysis and benchmarking for source code tasks. To fill this gap, this paper initiates to propose CodeS, a distribution shift benchmark dataset, for source code learning. Specifically, CodeS supports two programming languages (Java and Python) and five shift types (task, programmer, time-stamp, token, and concrete syntax tree). Extensive experiments based on CodeS reveal that 1) out-of-distribution detectors from other domains (e.g., computer vision) do not generalize to source code, 2) all code classification models suffer from distribution shifts, 3) representation-based shifts have a higher impact on the model than others, and 4) pretrained bimodal models are relatively more resistant to distribution shifts.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131402688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}