Estimating the software projects’ efforts developed by agile methods is important for project managers or technical leads. It provides a summary as a first view of how many hours and developers are required to complete the tasks. There are research works on automatic predicting the software efforts, including Term Frequency - Inverse Document Frequency (TFIDF) as the traditional approach for this problem. Graph Neural Network is a new approach that has been applied in Natural Language Processing for text classification. The advantages of Graph Neural Network are based on the ability to learn information via graph data structure, which has more representations such as the relationships between words compared to approaches of vectorizing sequence of words. In this paper, we show the potential and possible challenges of Graph Neural Network text classification in story point level estimation. By the experiments, we show that the GNN Text Level Classification can achieve as high accuracy as about 80% for story points level classification, which is comparable to the traditional approach. We also analyze the GNN approach and point out several current disadvantages that the GNN approach can improve for this problem or other problems in software engineering.
{"title":"Story Point Level Classification by Text Level Graph Neural Network","authors":"H. Phan, A. Jannesari","doi":"10.1145/3528588.3528654","DOIUrl":"https://doi.org/10.1145/3528588.3528654","url":null,"abstract":"Estimating the software projects’ efforts developed by agile methods is important for project managers or technical leads. It provides a summary as a first view of how many hours and developers are required to complete the tasks. There are research works on automatic predicting the software efforts, including Term Frequency - Inverse Document Frequency (TFIDF) as the traditional approach for this problem. Graph Neural Network is a new approach that has been applied in Natural Language Processing for text classification. The advantages of Graph Neural Network are based on the ability to learn information via graph data structure, which has more representations such as the relationships between words compared to approaches of vectorizing sequence of words. In this paper, we show the potential and possible challenges of Graph Neural Network text classification in story point level estimation. By the experiments, we show that the GNN Text Level Classification can achieve as high accuracy as about 80% for story points level classification, which is comparable to the traditional approach. We also analyze the GNN approach and point out several current disadvantages that the GNN approach can improve for this problem or other problems in software engineering.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115874168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually assigned by the project’s developers. However, manually labeling software reports is a time-consuming and error-prone task. In this paper, we describe a BERT-based classification technique to automatically label issues as questions, bugs, or enhancements. We evaluate our approach using a dataset containing over 800,000 labeled issues from real open source projects available on GitHub. Our approach classified reported issues with an average F1-score of 0.8571. Our technique outperforms a previous machine learning technique based on FastText.
{"title":"BERT-Based GitHub Issue Report Classification","authors":"Mohammed Latif Siddiq, Joanna C. S. Santos","doi":"10.1145/3528588.3528660","DOIUrl":"https://doi.org/10.1145/3528588.3528660","url":null,"abstract":"Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually assigned by the project’s developers. However, manually labeling software reports is a time-consuming and error-prone task. In this paper, we describe a BERT-based classification technique to automatically label issues as questions, bugs, or enhancements. We evaluate our approach using a dataset containing over 800,000 labeled issues from real open source projects available on GitHub. Our approach classified reported issues with an average F1-score of 0.8571. Our technique outperforms a previous machine learning technique based on FastText.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134124468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite Stack Overflow’s popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific and targeted code fixes. In this paper, we aim to help users identify informative code segments, once they have narrowed down their search to a post relevant to their task. Specifically, we explore natural language-based approaches to extract problematic and suggested code pairs from a post. The goal of the study is to investigate the potential of designing a browser extension to draw the readers’ attention to relevant code segments, and thus improve the experience of software engineers seeking help on Stack Overflow.
{"title":"Automatic Identification of Informative Code in Stack Overflow Posts","authors":"Preetha Chatterjee","doi":"10.1145/3528588.3528656","DOIUrl":"https://doi.org/10.1145/3528588.3528656","url":null,"abstract":"Despite Stack Overflow’s popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific and targeted code fixes. In this paper, we aim to help users identify informative code segments, once they have narrowed down their search to a post relevant to their task. Specifically, we explore natural language-based approaches to extract problematic and suggested code pairs from a post. The goal of the study is to investigate the potential of designing a browser extension to draw the readers’ attention to relevant code segments, and thus improve the experience of software engineers seeking help on Stack Overflow.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132097393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, Sebastiano Panichella
We report on the organization and results of the first edition of the Tool Competition from the International Workshop on Natural Language-based Software Engineering (NLBSE’22). This year, five teams submitted multiple classification models to automatically classify issue reports as bugs, enhancements, or questions. Most of them are based on BERT (Bidirectional Encoder Representations from Transformers) and were fine-tuned and evaluated on a benchmark dataset of 800k issue reports. The goal of the competition was to improve the classification performance of a baseline model based on fastText. This report provides details of the competition, including its rules, the teams and contestant models, and the ranking of models based on their average classification performance across the issue types.
{"title":"NLBSE’22 Tool Competition","authors":"Rafael Kallis, Oscar Chaparro, Andrea Di Sorbo, Sebastiano Panichella","doi":"10.1145/3528588.3528664","DOIUrl":"https://doi.org/10.1145/3528588.3528664","url":null,"abstract":"We report on the organization and results of the first edition of the Tool Competition from the International Workshop on Natural Language-based Software Engineering (NLBSE’22). This year, five teams submitted multiple classification models to automatically classify issue reports as bugs, enhancements, or questions. Most of them are based on BERT (Bidirectional Encoder Representations from Transformers) and were fine-tuned and evaluated on a benchmark dataset of 800k issue reports. The goal of the competition was to improve the classification performance of a baseline model based on fastText. This report provides details of the competition, including its rules, the teams and contestant models, and the ranking of models based on their average classification performance across the issue types.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124331067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph databases employ graph structures such as nodes, attributes and edges to model and store relationships among data. To access this data, graph query languages (GQL) such as Cypher are typically used, which might be difficult to master for end-users. In the context of relational databases, sequence to SQL models, which translate natural language questions to SQL queries, have been proposed. While these Neural Machine Translation (NMT) models increase the accessibility of relational databases, NMT models for graph databases are not yet available mainly due to the lack of suitable parallel training data. In this short paper we sketch an architecture which enables the generation of synthetic training data for the graph query language Cypher.
{"title":"From Zero to Hero: Generating Training Data for Question-To-Cypher Models","authors":"Dominik Opitz, N. Hochgeschwender","doi":"10.1145/3528588.3528655","DOIUrl":"https://doi.org/10.1145/3528588.3528655","url":null,"abstract":"Graph databases employ graph structures such as nodes, attributes and edges to model and store relationships among data. To access this data, graph query languages (GQL) such as Cypher are typically used, which might be difficult to master for end-users. In the context of relational databases, sequence to SQL models, which translate natural language questions to SQL queries, have been proposed. While these Neural Machine Translation (NMT) models increase the accessibility of relational databases, NMT models for graph databases are not yet available mainly due to the lack of suitable parallel training data. In this short paper we sketch an architecture which enables the generation of synthetic training data for the graph query language Cypher.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128885756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pre-trained transformer models are the current state-of-the-art for natural language models processing. seBERT is such a model, that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned this model for the NLBSE challenge for the task of issue type prediction. Our model dominates the baseline fastText for all three issue types in both recall and precision to achieve an overall F1-score of 85.7%, which is an increase of 4.1% over the baseline.
{"title":"Predicting Issue Types with seBERT","authors":"Alexander Trautsch, S. Herbold","doi":"10.1145/3528588.3528661","DOIUrl":"https://doi.org/10.1145/3528588.3528661","url":null,"abstract":"Pre-trained transformer models are the current state-of-the-art for natural language models processing. seBERT is such a model, that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned this model for the NLBSE challenge for the task of issue type prediction. Our model dominates the baseline fastText for all three issue types in both recall and precision to achieve an overall F1-score of 85.7%, which is an increase of 4.1% over the baseline.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121341012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowing the topics of a software forum post, such as those on StackOverflow, allows for greater analysis and understanding of the large amounts of data that come from these communities. One approach to this problem is using extreme multi label classification (XMLC) to predict the topic (or “tag”) of a post from a potentially very large candidate label set. While previous work has trained these models on data which has explicit text-to-tag information, we assess the classification ability of embedding models which have not been trained using such structured data (and are thus “unsupervised”) to assess the potential applicability to other forums or domains in which tag data is not available.We evaluate 14 unsupervised pre-trained models on 0.1% of all StackOverflow posts against all 61,662 possible StackOverflow tags. We find that an MPNet model trained partially on unlabelled StackExchange data (i.e. without tag data) achieves the highest score overall for this task, with a recall score of 0.161 R@1. These results inform which models are most appropriate for use in XMLC of StackOverflow posts when supervised training is not feasible. This offers insight into these models’ applicability in similar but not identical domains, such as software product forums. These results suggest that training embedding models using in-domain title-body or question-answer pairs can create an effective zero-shot topic classifier for situations where no topic data is available.
{"title":"Unsupervised Extreme Multi Label Classification of Stack Overflow Posts","authors":"Peter Devine, Kelly Blincoe","doi":"10.1145/3528588.3528652","DOIUrl":"https://doi.org/10.1145/3528588.3528652","url":null,"abstract":"Knowing the topics of a software forum post, such as those on StackOverflow, allows for greater analysis and understanding of the large amounts of data that come from these communities. One approach to this problem is using extreme multi label classification (XMLC) to predict the topic (or “tag”) of a post from a potentially very large candidate label set. While previous work has trained these models on data which has explicit text-to-tag information, we assess the classification ability of embedding models which have not been trained using such structured data (and are thus “unsupervised”) to assess the potential applicability to other forums or domains in which tag data is not available.We evaluate 14 unsupervised pre-trained models on 0.1% of all StackOverflow posts against all 61,662 possible StackOverflow tags. We find that an MPNet model trained partially on unlabelled StackExchange data (i.e. without tag data) achieves the highest score overall for this task, with a recall score of 0.161 R@1. These results inform which models are most appropriate for use in XMLC of StackOverflow posts when supervised training is not feasible. This offers insight into these models’ applicability in similar but not identical domains, such as software product forums. These results suggest that training embedding models using in-domain title-body or question-answer pairs can create an effective zero-shot topic classifier for situations where no topic data is available.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124474795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).
{"title":"Issue Report Classification Using Pre-trained Language Models","authors":"Giuseppe Colavito, F. Lanubile, Nicole Novielli","doi":"10.1145/3528588.3528659","DOIUrl":"https://doi.org/10.1145/3528588.3528659","url":null,"abstract":"This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for the automatic classification of GitHub issues. We experimented with different pre-trained models, achieving the best performance with fine-tuned RoBERTa (F1 = .8591).","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"3 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rand Alchokr, M. Borkar, Sharanya Thotadarya, G. Saake, Thomas Leich
Background: Systematic Literature Reviews are an important research method for gathering and evaluating the available evidence regarding a specific research topic. However, the process of conducting a Systematic Literature Review manually can be difficult and time-consuming. For this reason, researchers aim to semi-automate this process or some of its phases.Aim: We aimed at using a deep-learning based contextualized embeddings clustering technique involving transformer-based language models and a weighted scheme to accelerate the conduction phase of Systematic Literature Reviews for efficiently scanning the initial set of retrieved publications.Method: We performed an experiment using two manually conducted SLRs to evaluate the performance of two deep-learning-based clustering models. These models build on transformer-based deep language models (i.e., BERT and S-BERT) to extract contextualized embeddings on different text levels along with a weighted scheme to cluster similar publications.Results: Our primary results show that clustering based on embedding at paragraph-level using S-BERT-paragraph represents the best performing model setting in terms of optimizing the required parameters such as correctly identifying primary studies, number of additional documents identified as part of the relevant cluster and the execution time of the experiments.Conclusions: The findings indicate that using natural-language-based deep-learning architectures for semi-automating the selection of primary studies can accelerate the scanning and identification process. While our results represent first insights only, such a technique seems to enhance SLR process, promising to help researchers identify the most relevant publications more quickly and efficiently.
{"title":"Supporting Systematic Literature Reviews Using Deep-Learning-Based Language Models","authors":"Rand Alchokr, M. Borkar, Sharanya Thotadarya, G. Saake, Thomas Leich","doi":"10.1145/3528588.3528658","DOIUrl":"https://doi.org/10.1145/3528588.3528658","url":null,"abstract":"Background: Systematic Literature Reviews are an important research method for gathering and evaluating the available evidence regarding a specific research topic. However, the process of conducting a Systematic Literature Review manually can be difficult and time-consuming. For this reason, researchers aim to semi-automate this process or some of its phases.Aim: We aimed at using a deep-learning based contextualized embeddings clustering technique involving transformer-based language models and a weighted scheme to accelerate the conduction phase of Systematic Literature Reviews for efficiently scanning the initial set of retrieved publications.Method: We performed an experiment using two manually conducted SLRs to evaluate the performance of two deep-learning-based clustering models. These models build on transformer-based deep language models (i.e., BERT and S-BERT) to extract contextualized embeddings on different text levels along with a weighted scheme to cluster similar publications.Results: Our primary results show that clustering based on embedding at paragraph-level using S-BERT-paragraph represents the best performing model setting in terms of optimizing the required parameters such as correctly identifying primary studies, number of additional documents identified as part of the relevant cluster and the execution time of the experiments.Conclusions: The findings indicate that using natural-language-based deep-learning architectures for semi-automating the selection of primary studies can accelerate the scanning and identification process. While our results represent first insights only, such a technique seems to enhance SLR process, promising to help researchers identify the most relevant publications more quickly and efficiently.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the application of neural word embeddings for detecting cross-domain ambiguities in software requirements has gained a significant attention from the requirements engineering (RE) community. Several approaches have been proposed in the literature for estimating the variation of meaning of commonly used terms in different domains. A major limitation of these techniques is that they are unable to identify and detect the terms that have been used in different contexts within the same application domain, i.e. intra-domain ambiguities or in a requirements document of an interdisciplinary project. We propose an approach based on the idea of bidirectional encoder representations from Transformers (BERT) and clustering for identifying such ambiguities. For every context in which a term has been used in the document, our approach returns a list of its most similar words and also provides some example sentences from the corpus highlighting its context-specific interpretation. We apply our approach to a computer science (CS) specific corpora and a multi-domain corpora which consists of textual data from eight different application domains. Our experimental results show that this approach is very effective in identifying and detecting intra-domain ambiguities.
{"title":"Identification of Intra-Domain Ambiguity using Transformer-based Machine Learning","authors":"A. Moharil, Arpit Sharma","doi":"10.1145/3528588.3528651","DOIUrl":"https://doi.org/10.1145/3528588.3528651","url":null,"abstract":"Recently, the application of neural word embeddings for detecting cross-domain ambiguities in software requirements has gained a significant attention from the requirements engineering (RE) community. Several approaches have been proposed in the literature for estimating the variation of meaning of commonly used terms in different domains. A major limitation of these techniques is that they are unable to identify and detect the terms that have been used in different contexts within the same application domain, i.e. intra-domain ambiguities or in a requirements document of an interdisciplinary project. We propose an approach based on the idea of bidirectional encoder representations from Transformers (BERT) and clustering for identifying such ambiguities. For every context in which a term has been used in the document, our approach returns a list of its most similar words and also provides some example sentences from the corpus highlighting its context-specific interpretation. We apply our approach to a computer science (CS) specific corpora and a multi-domain corpora which consists of textual data from eight different application domains. Our experimental results show that this approach is very effective in identifying and detecting intra-domain ambiguities.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"19 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}