2023 International Conference on Code Quality (ICCQ)最新文献

英文中文

Test-based and metric-based evaluation of code generation models for practical question answering 基于测试和基于度量的实际问题回答代码生成模型的评估

2023 International Conference on Code Quality (ICCQ)

Pub Date : 2023-04-22 DOI: 10.1109/ICCQ57276.2023.10114665

Sergey Kovalchuk, Dmitriy Fedrushkov, Vadim Lomshakov, Artem Aliev

We performed a comparative analysis of code generation model performance with evaluation using common NLP metrics in comparison to a test-based evaluation. The investigation was performed in the context of question answering with code (test-to-code problem) and was aimed at applicability checking both ways for generated code evaluation in a fully automatic manner. We used CodeGen and GPTNeo pretrained models applied to a problem of question answering using Stack Overflow-based corpus (APIzation). For test-based evaluation, industrial test-generation solutions (Machinet, UTBot) were used for providing automatically generated tests. The analysis showed that the performance evaluation based solely on NLP metrics or on tests provides a rather limited assessment of generated code quality. We see the evidence that predictions with both high and low NLP metrics exist that pass and don't pass tests. With the early results of our empirical study being discussed in this paper, we believe that the combination of both approaches may increase possible ways for building, evaluating, and training code generation models.

我们对代码生成模型的性能进行了比较分析，并与基于测试的评估相比，使用了通用的NLP度量。调查是在用代码回答问题(从测试到代码的问题)的环境中进行的，目的是以全自动的方式检查生成代码评估的两种方法的适用性。我们使用CodeGen和GPTNeo预训练模型应用于使用基于堆栈溢出的语料库(APIzation)的问题回答。对于基于测试的评估，工业测试生成解决方案(Machinet, UTBot)用于提供自动生成的测试。分析表明，仅基于NLP度量或测试的性能评估对生成的代码质量提供了相当有限的评估。我们看到有证据表明，具有高和低NLP指标的预测存在通过和不通过测试的情况。根据我们在本文中讨论的经验研究的早期结果，我们相信两种方法的结合可能会增加构建、评估和训练代码生成模型的可能方法。

{"title":"Test-based and metric-based evaluation of code generation models for practical question answering","authors":"Sergey Kovalchuk, Dmitriy Fedrushkov, Vadim Lomshakov, Artem Aliev","doi":"10.1109/ICCQ57276.2023.10114665","DOIUrl":"https://doi.org/10.1109/ICCQ57276.2023.10114665","url":null,"abstract":"We performed a comparative analysis of code generation model performance with evaluation using common NLP metrics in comparison to a test-based evaluation. The investigation was performed in the context of question answering with code (test-to-code problem) and was aimed at applicability checking both ways for generated code evaluation in a fully automatic manner. We used CodeGen and GPTNeo pretrained models applied to a problem of question answering using Stack Overflow-based corpus (APIzation). For test-based evaluation, industrial test-generation solutions (Machinet, UTBot) were used for providing automatically generated tests. The analysis showed that the performance evaluation based solely on NLP metrics or on tests provides a rather limited assessment of generated code quality. We see the evidence that predictions with both high and low NLP metrics exist that pass and don't pass tests. With the early results of our empirical study being discussed in this paper, we believe that the combination of both approaches may increase possible ways for building, evaluating, and training code generation models.","PeriodicalId":318687,"journal":{"name":"2023 International Conference on Code Quality (ICCQ)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114885420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding Software Performance Challenges an Empirical Study on Stack Overflow 理解软件性能挑战堆栈溢出的实证研究

2023 International Conference on Code Quality (ICCQ)

Pub Date : 2023-04-22 DOI: 10.1109/ICCQ57276.2023.10114662

Deema Adeeb Al Shoaibi, Mohamed Wiem Mkaouer

Performance is a quality aspect describing how the software is performing. Any performance degradation will further affect other quality aspects, such as usability. Software developers continuously conduct testing to ensure that code addition or changes do not damage existing functionalities or negatively affect the quality. Hence, developers set strategies to detect, locate and fix the regression if needed. In this paper, we provide an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical forum directed to performance regression. We collected 1828 questions discussing the regression of software execution time. All those questions are manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the developers community. This study provides insights to help developers during the software design and implementation to avoid regression causes.

性能是描述软件如何执行的质量方面。任何性能下降都会进一步影响其他质量方面，比如可用性。软件开发人员不断地进行测试，以确保代码添加或更改不会损害现有功能或对质量产生负面影响。因此，如果需要，开发人员可以设置策略来检测、定位和修复回归。在本文中，我们对开发人员在解决性能回归问题时面临的挑战进行了探索性研究。这项研究是基于在一个技术论坛上发布的有关性能回归的问题。我们收集了1828个讨论软件执行时间回归的问题。所有这些问题都是人工分析的。这项研究对这些挑战进行了分类。我们还讨论了开发人员社区中性能回归问题的难度级别。本研究为帮助开发人员在软件设计和实现过程中避免回归原因提供了见解。

引用次数: 1

Applying Machine Learning Analysis for Software Quality Test 机器学习分析在软件质量测试中的应用

2023 International Conference on Code Quality (ICCQ)

Pub Date : 2023-04-22 DOI: 10.1109/ICCQ57276.2023.10114664

Al Khan, R. R. Mekuria, Ruslan Isaev

One of the biggest expense in software development is the maintenance. Therefore, it's critical to comprehend what triggers maintenance and if it may be predicted. Numerous research outputs have demonstrated that specific methods of assessing the complexity of created programs may produce useful prediction models to as-certain the possibility of maintenance due to software failures. As a routine it is performed prior to the release, and setting up the models frequently calls for certain, object-oriented software measurements. It's not always the case that software developers have access to these measurements. In this paper, machine learning is applied on the available data to calculate the cumulative software failure levels. A technique to forecast a software's residual defectiveness using machine learning can be looked into as a solution to the challenge of predicting residual flaws. Software metrics and defect data were separated out of the static source code repository. Static code is used to create software metrics, and reported bugs in the repository are used to gather defect information. By using a correlation method, metrics that had no connection to the defect data were removed. This makes it possible to analyze all the data without pausing the programming process. Large, sophisticated software's primary issue is that it is impossible to control everything manually, and the cost of an error can be quite expensive. Developers may miss errors during testing as a consequence, which will raise maintenance costs. Finding a method to accurately forecast software defects is the overall objective.

软件开发中最大的开销之一是维护。因此，理解是什么触发了维护以及它是否可以预测是至关重要的。许多研究结果表明，评估已创建程序复杂性的特定方法可以产生有用的预测模型，以确定由于软件故障而进行维护的可能性。作为例行程序，它是在发布之前执行的，并且建立模型经常需要某些面向对象的软件度量。软件开发人员并不总是能够访问这些度量。本文将机器学习应用于可用数据，计算软件累积故障水平。使用机器学习预测软件剩余缺陷的技术可以作为预测剩余缺陷挑战的解决方案。软件度量和缺陷数据从静态源代码存储库中分离出来。静态代码用于创建软件度量，存储库中报告的错误用于收集缺陷信息。通过使用关联方法，与缺陷数据没有联系的量度被移除。这使得在不暂停编程过程的情况下分析所有数据成为可能。大型复杂软件的主要问题是不可能手动控制一切，而且错误的成本可能相当昂贵。因此，开发人员可能会在测试期间遗漏错误，这将增加维护成本。找到一种准确预测软件缺陷的方法是总体目标。

{"title":"Applying Machine Learning Analysis for Software Quality Test","authors":"Al Khan, R. R. Mekuria, Ruslan Isaev","doi":"10.1109/ICCQ57276.2023.10114664","DOIUrl":"https://doi.org/10.1109/ICCQ57276.2023.10114664","url":null,"abstract":"One of the biggest expense in software development is the maintenance. Therefore, it's critical to comprehend what triggers maintenance and if it may be predicted. Numerous research outputs have demonstrated that specific methods of assessing the complexity of created programs may produce useful prediction models to as-certain the possibility of maintenance due to software failures. As a routine it is performed prior to the release, and setting up the models frequently calls for certain, object-oriented software measurements. It's not always the case that software developers have access to these measurements. In this paper, machine learning is applied on the available data to calculate the cumulative software failure levels. A technique to forecast a software's residual defectiveness using machine learning can be looked into as a solution to the challenge of predicting residual flaws. Software metrics and defect data were separated out of the static source code repository. Static code is used to create software metrics, and reported bugs in the repository are used to gather defect information. By using a correlation method, metrics that had no connection to the defect data were removed. This makes it possible to analyze all the data without pausing the programming process. Large, sophisticated software's primary issue is that it is impossible to control everything manually, and the cost of an error can be quite expensive. Developers may miss errors during testing as a consequence, which will raise maintenance costs. Finding a method to accurately forecast software defects is the overall objective.","PeriodicalId":318687,"journal":{"name":"2023 International Conference on Code Quality (ICCQ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129329566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mutant Selection Strategies in Mutation Testing 突变检测中的突变选择策略

2023 International Conference on Code Quality (ICCQ)

Pub Date : 2023-04-22 DOI: 10.1109/ICCQ57276.2023.10114663

Rowland Pitts

Mutation Testing offers a powerful approach to assessing unit test set quality; however, software developers are often reluctant to embrace the technique because of the tremendous number of mutants it generates, including redundant and equivalent mutants. Researchers have sought strategies to reduce the number of mutants without reducing effectiveness, and also ways to select more effective mutants, but no strategy has performed better than random mutant selection. Equivalent mutants, which cannot be killed, make achieving mutation adequacy difficult, so most research is conducted with the assumption that unkilled mutants are equivalent. Using 15 java.lang classes that are known to have truly mutation adequate test sets, this research demonstrates that even when the number of equivalent mutants is drastically reduced, they remain a tester's largest problem, and that apart from their presence achieving mutation adequacy is relatively easy. It also assesses a variety of mutant selection strategies and demonstrates that even with mutation adequate test sets, none perform as well as random mutant selection.

突变测试提供了一种评估单元测试集质量的强大方法;然而，软件开发人员通常不愿意采用这种技术，因为它会产生大量的突变，包括冗余的和等效的突变。研究人员一直在寻求在不降低有效性的情况下减少突变体数量的策略，以及选择更有效突变体的方法，但没有一种策略比随机突变体选择表现得更好。由于等效突变体不能被杀死，使得突变充分性难以实现，因此大多数研究都假设未被杀死的突变体是等效的。使用15 java。对于已知具有真正足够突变的测试集的Lang类，本研究表明，即使等效突变的数量急剧减少，它们仍然是测试人员最大的问题，并且除了它们的存在之外，实现足够突变相对容易。它还评估了各种突变选择策略，并证明即使有足够的突变测试集，也没有一个比随机突变选择更有效。

引用次数: 0

What IS Code Quality: Keynote 什么是代码质量:主题演讲

2023 International Conference on Code Quality (ICCQ)

Pub Date : 2023-04-22 DOI: 10.1109/iccq57276.2023.10114655

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 International Conference on Code Quality (ICCQ)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀