2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)最新文献

英文中文

A Static Analysis Framework for Data Science Notebooks 数据科学笔记本的静态分析框架

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2021-10-15 DOI: 10.1145/3510457.3513032

Pavle Suboti'c, Lazar Miliki'c, M. Stojic

Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualisations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is unexpected behaviour caused by the unique out-of-order execution model of notebooks. As a result, data scientists face various challenges ranging from notebook correctness, reproducibility and cleaning. In this paper, we propose a framework that performs static analysis on notebooks, incorporating their unique execution semantics. Our framework is general in the sense that it accommodates a wide range of analyses, useful for various notebook use cases. We have instantiated our framework on a diverse set of analyses and have evaluated them on 2211 real world notebooks. Our evaluation demonstrates that the vast majority (98.7%) of notebooks can be analysed in less than a second, well within the time frame required by interactive notebook clients.

笔记本电脑为程序员提供了一个交互式的环境来开发代码，分析数据，并在一个环境中注入交错的可视化。尽管它们具有灵活性，但数据科学家遇到的一个主要陷阱是由笔记本电脑独特的乱序执行模型引起的意外行为。因此，数据科学家面临着各种各样的挑战，从笔记本的正确性、再现性到清理。在本文中，我们提出了一个框架，可以对笔记本进行静态分析，并结合其独特的执行语义。我们的框架是通用的，因为它容纳了广泛的分析，对各种笔记本用例都很有用。我们已经在一组不同的分析中实例化了我们的框架，并在2211个真实世界的笔记本上对它们进行了评估。我们的评估表明，绝大多数(98.7%)的笔记本电脑可以在不到一秒钟的时间内进行分析，完全在交互式笔记本客户端所需的时间范围内。

引用次数: 11

Mining Idioms in the Wild 野外采矿习语

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2021-07-13 DOI: 10.1145/3510457.3513046

Aishwarya Sivaraman, Rui Abreu, Andrew C. Scott, Tobi Akomolede, S. Chandra

Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new APIs or have not gotten around to them. Detection of idiomatic code can also point to the need for new APIs. We share our experiences in mining imperative idiomatic patterns from the Hack repo at Facebook. We found that existing techniques either cannot identify meaningful patterns from syntax trees or require test-suite-based dynamic analysis to incorporate semantic properties to mine useful patterns. The key insight of the approach proposed in this paper – Jezero – is that semantic idioms from a large codebase can be learned from canonicalized dataflow trees. We propose a scalable, lightweight static analysis-based approach to construct such a tree that is well suited to mine semantic idioms using nonparametric Bayesian methods. Our experiments with Jezero on Hack code show a clear advantage of adding canonicalized dataflow information to ASTs: Jezero was significantly more effective in finding new refactoring opportunities from unannotated legacy code than a baseline that did not have the dataflow augmentation.

现有代码存储库包含许多代码模式实例，这些代码模式是完成特定编程任务的惯用方法。有时，所使用的编程语言支持特定的操作符或api，这些操作符或api可以更简洁地表达相同的惯用命令式代码。然而，这些代码模式留在存储库中，因为开发人员可能不知道新的api，或者没有抽出时间使用它们。对惯用代码的检测还可以指出需要新的api。我们将分享从Facebook的Hack仓库中挖掘命令式惯用模式的经验。我们发现现有的技术要么不能从语法树中识别有意义的模式，要么需要基于测试套件的动态分析来结合语义属性来挖掘有用的模式。本文提出的方法(Jezero)的关键观点是，可以从规范化的数据流树中学习大型代码库中的语义习惯用法。我们提出了一种可扩展的、轻量级的、基于静态分析的方法来构建这样一个树，它非常适合使用非参数贝叶斯方法挖掘语义习惯用法。我们在Hack代码上使用Jezero的实验显示了向ast中添加规范化数据流信息的明显优势:在从未注释的遗留代码中发现新的重构机会方面，Jezero明显比没有数据流增强的基线更有效。

{"title":"Mining Idioms in the Wild","authors":"Aishwarya Sivaraman, Rui Abreu, Andrew C. Scott, Tobi Akomolede, S. Chandra","doi":"10.1145/3510457.3513046","DOIUrl":"https://doi.org/10.1145/3510457.3513046","url":null,"abstract":"Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new APIs or have not gotten around to them. Detection of idiomatic code can also point to the need for new APIs. We share our experiences in mining imperative idiomatic patterns from the Hack repo at Facebook. We found that existing techniques either cannot identify meaningful patterns from syntax trees or require test-suite-based dynamic analysis to incorporate semantic properties to mine useful patterns. The key insight of the approach proposed in this paper – Jezero – is that semantic idioms from a large codebase can be learned from canonicalized dataflow trees. We propose a scalable, lightweight static analysis-based approach to construct such a tree that is well suited to mine semantic idioms using nonparametric Bayesian methods. Our experiments with Jezero on Hack code show a clear advantage of adding canonicalized dataflow information to ASTs: Jezero was significantly more effective in finding new refactoring opportunities from unannotated legacy code than a baseline that did not have the dataflow augmentation.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"116 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131236011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Code Autocompletion with Transfer Learning 用迁移学习改进代码自动完成

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2021-05-12 DOI: 10.1145/3510457.3513061

Wenjie Zhou, Seohyun Kim, V. Murali, Gareth Ari Aye

Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integration. Recently, accuracy in autocompletion prediction improved 12.8%[2] from training on a real-world dataset collected from programmers’ IDE activities. But what if the number of examples of IDE autocompletion in the target programming language is inadequate for model training? In this paper, we highlight practical reasons for this inadequacy, and make a call to action in using transfer learning to overcome the issue.

软件语言模型在预测代码完成使用方面取得了可喜的成果，一些行业研究已经描述了成功的IDE集成。最近，通过在程序员的IDE活动中收集的真实数据集上进行训练，自动完成预测的准确性提高了12.8%[2]。但是，如果目标编程语言中的IDE自动补全示例的数量不足以进行模型训练怎么办?在本文中，我们强调了这种不足的现实原因，并呼吁采取行动，利用迁移学习来克服这一问题。

引用次数: 8

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀