arXiv - CS - Programming Languages最新文献

英文中文

Expressing and Analyzing Quantum Algorithms with Qualtran 用 Qualtran 表达和分析量子算法

arXiv - CS - Programming Languages

Pub Date : 2024-09-06 DOI: arxiv-2409.04643

Matthew P. Harrigan, Tanuj Khattar, Charles Yuan, Anurudh Peduri, Noureldin Yosri, Fionn D. Malone, Ryan Babbush, Nicholas C. Rubin

Quantum computing's transition from theory to reality has spurred the needfor novel software tools to manage the increasing complexity, sophistication,toil, and fallibility of quantum algorithm development. We present Qualtran, anopen-source library for representing and analyzing quantum algorithms. Usingappropriate abstractions and data structures, we can simulate and testalgorithms, automatically generate information-rich diagrams, and tabulateresource requirements. Qualtran offers a standard library of algorithmicbuilding blocks that are essential for modern cost-minimizing compilations. Itscapabilities are showcased through the re-analysis of key algorithms inHamiltonian simulation, chemistry, and cryptography. Architecture-independentresource counts output by Qualtran can be forwarded to our implementation ofcost models to estimate physical costs like wall-clock time and number ofphysical qubits assuming a surface-code architecture. Qualtran provides afoundation for explicit constructions and reproducible analysis, fosteringgreater collaboration within the growing quantum algorithm developmentcommunity.

量子计算从理论到现实的转变促使人们需要新颖的软件工具来管理量子算法开发中日益增长的复杂性、精密性、艰苦性和易错性。我们介绍的 Qualtran 是一个用于表示和分析量子算法的开源库。利用适当的抽象和数据结构，我们可以模拟和测试算法，自动生成信息丰富的图表，并将资源需求制表。Qualtran 提供了现代成本最小化编译所必需的算法构建模块标准库。通过重新分析哈密尔顿仿真、化学和密码学中的关键算法，Qualtran 的能力得到了充分展示。Qualtran 输出的独立于体系结构的资源计数可以转发到我们的成本模型实现中，以估算物理成本，如壁钟时间和假设表面代码体系结构的物理比特数。Qualtran 为明确的构造和可重现的分析提供了基础，促进了日益增长的量子算法开发社区内的更大合作。

{"title":"Expressing and Analyzing Quantum Algorithms with Qualtran","authors":"Matthew P. Harrigan, Tanuj Khattar, Charles Yuan, Anurudh Peduri, Noureldin Yosri, Fionn D. Malone, Ryan Babbush, Nicholas C. Rubin","doi":"arxiv-2409.04643","DOIUrl":"https://doi.org/arxiv-2409.04643","url":null,"abstract":"Quantum computing's transition from theory to reality has spurred the need\u0000for novel software tools to manage the increasing complexity, sophistication,\u0000toil, and fallibility of quantum algorithm development. We present Qualtran, an\u0000open-source library for representing and analyzing quantum algorithms. Using\u0000appropriate abstractions and data structures, we can simulate and test\u0000algorithms, automatically generate information-rich diagrams, and tabulate\u0000resource requirements. Qualtran offers a standard library of algorithmic\u0000building blocks that are essential for modern cost-minimizing compilations. Its\u0000capabilities are showcased through the re-analysis of key algorithms in\u0000Hamiltonian simulation, chemistry, and cryptography. Architecture-independent\u0000resource counts output by Qualtran can be forwarded to our implementation of\u0000cost models to estimate physical costs like wall-clock time and number of\u0000physical qubits assuming a surface-code architecture. Qualtran provides a\u0000foundation for explicit constructions and reproducible analysis, fostering\u0000greater collaboration within the growing quantum algorithm development\u0000community.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Brief Overview of the Pawns Programming Language 卒子编程语言简介

arXiv - CS - Programming Languages

Pub Date : 2024-09-05 DOI: arxiv-2409.03152

Lee Naish

Pawns is a programming language under development which supports purefunctional programming (including algebraic data types, higher orderprogramming and parametric polymorphism) and imperative programming (includingpointers, destructive update of shared data structures and global variables),integrated so each can call the other and with purity checked by the compiler.For pure functional code the programmer need not understand the representationof the data structures. For imperative code the representation must beunderstood and all effects and dependencies must be documented in the code. Forexample, if a function may update one of its arguments, this must be declaredin the function type signature and noted where the function is called. A singleupdate operation may affect several variables due to sharing of representations(pointer aliasing). Pawns code requires all affected variables to be annotatedwherever they may be updated and information about sharing to be declared.Annotations are also required where IO or other global variables are used andthis must be declared in type signatures as well. Sharing analysis, performedby the compiler, is the key to many aspects of Pawns. It enables us to checkthat all effects are made obvious in the source code, effects can beencapsulated inside a pure interface and effects can be used safely in thepresence of polymorphism.

Pawns 是一种正在开发中的编程语言，它支持纯函数式编程（包括代数数据类型、高阶编程和参数多态性）和命令式编程（包括指针、共享数据结构和全局变量的破坏性更新）。对于命令式代码，必须理解数据结构的表示法，并在代码中记录所有影响和依赖关系。例如，如果函数可能会更新其一个参数，则必须在函数类型签名中声明，并在函数调用时注明。由于共享表示（指针别名），一次更新操作可能会影响多个变量。当使用 IO 或其他全局变量时，也需要使用注释，而且必须在类型签名中声明。编译器进行的共享分析是 Pawns 许多方面的关键。它使我们能够检查源代码中的所有效果是否明显，效果是否可以封装在纯接口中，以及效果是否可以在多态的情况下安全使用。

{"title":"A Brief Overview of the Pawns Programming Language","authors":"Lee Naish","doi":"arxiv-2409.03152","DOIUrl":"https://doi.org/arxiv-2409.03152","url":null,"abstract":"Pawns is a programming language under development which supports pure\u0000functional programming (including algebraic data types, higher order\u0000programming and parametric polymorphism) and imperative programming (including\u0000pointers, destructive update of shared data structures and global variables),\u0000integrated so each can call the other and with purity checked by the compiler.\u0000For pure functional code the programmer need not understand the representation\u0000of the data structures. For imperative code the representation must be\u0000understood and all effects and dependencies must be documented in the code. For\u0000example, if a function may update one of its arguments, this must be declared\u0000in the function type signature and noted where the function is called. A single\u0000update operation may affect several variables due to sharing of representations\u0000(pointer aliasing). Pawns code requires all affected variables to be annotated\u0000wherever they may be updated and information about sharing to be declared.\u0000Annotations are also required where IO or other global variables are used and\u0000this must be declared in type signatures as well. Sharing analysis, performed\u0000by the compiler, is the key to many aspects of Pawns. It enables us to check\u0000that all effects are made obvious in the source code, effects can be\u0000encapsulated inside a pure interface and effects can be used safely in the\u0000presence of polymorphism.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"139 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic String Generation and C++-style Output in Fortran 在 Fortran 中动态生成字符串和 C++ 风格输出

arXiv - CS - Programming Languages

Pub Date : 2024-09-05 DOI: arxiv-2409.03397

Marcus Mohr

Using standard components of modern Fortran we present a technique todynamically generate strings with as little coding overhead as possible on theapplication side. Additionally we demonstrate how this can be extended to allowfor output generation with a C++ stream-like look and feel.

利用现代 Fortran 的标准组件，我们提出了一种动态生成字符串的技术，尽可能减少应用程序端的编码开销。此外，我们还演示了如何将该技术扩展到以类似于 C++ 流的外观和感觉生成输出。

引用次数: 0

The MLIR Transform Dialect. Your compiler is more powerful than you think MLIR 转换方言。你的编译器比你想象的更强大

arXiv - CS - Programming Languages

Pub Date : 2024-09-05 DOI: arxiv-2409.03864

Martin Lücke, Oleksandr Zinenko, William S. Moses, Michel Steuwer, Albert Cohen

To take full advantage of a specific hardware target, performance engineersneed to gain control on compilers in order to leverage their domain knowledgeabout the program and hardware. Yet, modern compilers are poorly controlled,usually by configuring a sequence of coarse-grained monolithic black-boxpasses, or by means of predefined compiler annotations/pragmas. These can beeffective, but often do not let users precisely optimize their varying computeloads. As a consequence, performance engineers have to resort to implementingcustom passes for a specific optimization heuristic, requiring compilerengineering expert knowledge. In this paper, we present a technique that provides fine-grained control ofgeneral-purpose compilers by introducing the Transform dialect, a controllableIR-based transformation system implemented in MLIR. The Transform dialectempowers performance engineers to optimize their various compute loads bycomposing and reusing existing - but currently hidden - compiler featureswithout the need to implement new passes or even rebuilding the compiler. We demonstrate in five case studies that the Transform dialect enablesprecise, safe composition of compiler transformations and allows forstraightforward integration with state-of-the-art search methods.

为了充分利用特定硬件目标，性能工程师需要对编译器进行控制，以便充分利用他们对程序和硬件的领域知识。然而，现代编译器的控制能力很差，通常是通过配置一系列粗粒度的单片黑盒子，或者通过预定义的编译器注释/语法。这些方法虽然有效，但往往无法让用户精确优化不同的计算负荷。因此，性能工程师不得不为特定的优化启发式实施自定义通路，这需要编译工程方面的专业知识。在本文中，我们介绍了一种技术，通过引入 Transform 方言（一种在 MLIR 中实现的基于 IR 的可控转换系统），对通用编译器进行细粒度控制。Transform 方言使性能工程师能够通过组合和重用现有但目前隐藏的编译器功能来优化各种计算负载，而无需实现新的传递，甚至无需重建编译器。我们通过五个案例研究证明，Transform 方言能够精确、安全地组合编译器转换，并允许与最先进的搜索方法直接集成。

{"title":"The MLIR Transform Dialect. Your compiler is more powerful than you think","authors":"Martin Lücke, Oleksandr Zinenko, William S. Moses, Michel Steuwer, Albert Cohen","doi":"arxiv-2409.03864","DOIUrl":"https://doi.org/arxiv-2409.03864","url":null,"abstract":"To take full advantage of a specific hardware target, performance engineers\u0000need to gain control on compilers in order to leverage their domain knowledge\u0000about the program and hardware. Yet, modern compilers are poorly controlled,\u0000usually by configuring a sequence of coarse-grained monolithic black-box\u0000passes, or by means of predefined compiler annotations/pragmas. These can be\u0000effective, but often do not let users precisely optimize their varying compute\u0000loads. As a consequence, performance engineers have to resort to implementing\u0000custom passes for a specific optimization heuristic, requiring compiler\u0000engineering expert knowledge. In this paper, we present a technique that provides fine-grained control of\u0000general-purpose compilers by introducing the Transform dialect, a controllable\u0000IR-based transformation system implemented in MLIR. The Transform dialect\u0000empowers performance engineers to optimize their various compute loads by\u0000composing and reusing existing - but currently hidden - compiler features\u0000without the need to implement new passes or even rebuilding the compiler. We demonstrate in five case studies that the Transform dialect enables\u0000precise, safe composition of compiler transformations and allows for\u0000straightforward integration with state-of-the-art search methods.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

arXiv - CS - Programming Languages

Pub Date : 2024-09-04 DOI: arxiv-2409.03119

Varun Rao, Zachary D. Sisco

Hardware decompilation reverses logic synthesis, converting a gate-leveldigital electronic design, or netlist, back up to hardware description language(HDL) code. Existing techniques decompile data-oriented features in netlists,like loops and modules, but struggle with sequential logic. In particular, theycannot decompile memory elements, which pose difficulty due to theirdeconstruction into individual bits and the feedback loops they form in thenetlist. Recovering multi-bit registers and memory blocks from netlists wouldexpand the applications of hardware decompilation, notably towards retargetingtechnologies (e.g. FPGAs to ASICs) and decompiling processor memories. Wedevise a method for register aggregation, to identify relationships between thedata flip-flops in a netlist and group them into registers and memory blocks,resulting in HDL code that instantiates these memory elements. We aggregateflip-flops by identifying common enable pins, and derive the bit-order of theresulting registers using functional dependencies. This scales similarly tomemory blocks, where we repeat the algorithm in the second dimension withspecial attention to the read, write, and address ports of each memory block.We evaluate our technique over a dataset of 13 gate-level netlists, comprisingcircuits from binary multipliers to CPUs, and we compare the quantity andwidths of recovered registers and memory blocks with the original source code.The technique successfully recovers memory elements in all of the testedcircuits, even aggregating beyond the source code expectation. In 10 / 13circuits, all source code memory elements are accounted for, and we are able tocompact up to 2048 disjoint bits into a single memory block.

硬件反编译逆转逻辑综合，将门-线-数字电子设计或网表转换回硬件描述语言（HDL）代码。现有技术可以反编译网表中面向数据的功能，如循环和模块，但在处理顺序逻辑时却很困难。特别是，这些技术无法反编译内存元素，因为内存元素被分解为单个位，并在网表中形成反馈回路，这给反编译带来了困难。从网表中恢复多位寄存器和内存块将扩大硬件反编译的应用范围，特别是在重定向技术（如将 FPGA 转换为 ASIC）和反编译处理器内存方面。我们提出了一种寄存器聚合方法，用于识别网表中数据触发器之间的关系，并将其归类为寄存器和存储器块，从而生成实例化这些存储器元素的 HDL 代码。我们通过识别共同的使能引脚来聚合触发器，并利用功能依赖关系推导出由此产生的寄存器的位序。我们在 13 个门级网表数据集上评估了我们的技术，其中包括从二进制乘法器到 CPU 的各种电路，并将恢复的寄存器和内存块的数量和宽度与原始源代码进行了比较。在 10/13 个电路中，所有源代码内存元素都得到了考虑，而且我们能够将多达 2048 个不相连的比特压缩到单个内存块中。

{"title":"Register Aggregation for Hardware Decompilation","authors":"Varun Rao, Zachary D. Sisco","doi":"arxiv-2409.03119","DOIUrl":"https://doi.org/arxiv-2409.03119","url":null,"abstract":"Hardware decompilation reverses logic synthesis, converting a gate-level\u0000digital electronic design, or netlist, back up to hardware description language\u0000(HDL) code. Existing techniques decompile data-oriented features in netlists,\u0000like loops and modules, but struggle with sequential logic. In particular, they\u0000cannot decompile memory elements, which pose difficulty due to their\u0000deconstruction into individual bits and the feedback loops they form in the\u0000netlist. Recovering multi-bit registers and memory blocks from netlists would\u0000expand the applications of hardware decompilation, notably towards retargeting\u0000technologies (e.g. FPGAs to ASICs) and decompiling processor memories. We\u0000devise a method for register aggregation, to identify relationships between the\u0000data flip-flops in a netlist and group them into registers and memory blocks,\u0000resulting in HDL code that instantiates these memory elements. We aggregate\u0000flip-flops by identifying common enable pins, and derive the bit-order of the\u0000resulting registers using functional dependencies. This scales similarly to\u0000memory blocks, where we repeat the algorithm in the second dimension with\u0000special attention to the read, write, and address ports of each memory block.\u0000We evaluate our technique over a dataset of 13 gate-level netlists, comprising\u0000circuits from binary multipliers to CPUs, and we compare the quantity and\u0000widths of recovered registers and memory blocks with the original source code.\u0000The technique successfully recovers memory elements in all of the tested\u0000circuits, even aggregating beyond the source code expectation. In 10 / 13\u0000circuits, all source code memory elements are accounted for, and we are able to\u0000compact up to 2048 disjoint bits into a single memory block.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming CoolerSpace：一种物理上正确、计算上高效的色彩编程语言

arXiv - CS - Programming Languages

Pub Date : 2024-09-04 DOI: arxiv-2409.02771

Ethan Chen, Jiwon Chang, Yuhao Zhu

Color programmers manipulate lights, materials, and the resulting colors fromlight-material interactions. Existing libraries for color programming provideonly a thin layer of abstraction around matrix operations. Color programs are,thus, vulnerable to bugs arising from mathematically permissible but physicallymeaningless matrix computations. Correct implementations are difficult to writeand optimize. We introduce CoolerSpace to facilitate physically correct andcomputationally efficient color programming. CoolerSpace raises the level ofabstraction of color programming by allowing programmers to focus on describingthe logic of color physics. Correctness and efficiency are handled byCoolerSpace. The type system in CoolerSpace assigns physical meaning anddimensions to user-defined objects. The typing rules permit only legalcomputations informed by color physics and perception. Along with typechecking, CoolerSpace also generates performance-optimized programs usingequality saturation. CoolerSpace is implemented as a Python library andcompiles to ONNX, a common intermediate representation for tensor computations.CoolerSpace not only prevents common errors in color programming, but also doesso without run-time overhead: even unoptimized CoolerSpace programs out-performexisting Python-based color programming systems by up to 5.7 times; ouroptimizations provide up to an additional 1.4 times speed-up.

色彩编程人员可以操作灯光、材料以及灯光与材料相互作用产生的色彩。现有的色彩编程库仅为矩阵运算提供了一层薄薄的抽象层。因此，色彩程序很容易受到数学上允许但物理上无意义的矩阵计算所产生的错误的影响。正确的实现很难编写和优化。我们引入了 CoolerSpace，以促进物理上正确、计算上高效的色彩编程。CoolerSpace 允许程序员专注于描述色彩物理逻辑，从而提高了色彩编程的抽象程度。正确性和效率由 CoolerSpace 处理。CoolerSpace 的类型系统为用户定义的对象赋予物理意义和尺寸。类型规则只允许根据色彩物理和感知进行合法计算。除了类型检查，CoolerSpace 还能使用质量饱和度生成性能优化的程序。CoolerSpace 不仅能防止色彩编程中的常见错误，还能在没有运行时开销的情况下做到这一点：即使是未经优化的 CoolerSpace 程序，其性能也比现有的基于 Python 的色彩编程系统高出 5.7 倍；经过优化后，其速度最多可提高 1.4 倍。

{"title":"CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming","authors":"Ethan Chen, Jiwon Chang, Yuhao Zhu","doi":"arxiv-2409.02771","DOIUrl":"https://doi.org/arxiv-2409.02771","url":null,"abstract":"Color programmers manipulate lights, materials, and the resulting colors from\u0000light-material interactions. Existing libraries for color programming provide\u0000only a thin layer of abstraction around matrix operations. Color programs are,\u0000thus, vulnerable to bugs arising from mathematically permissible but physically\u0000meaningless matrix computations. Correct implementations are difficult to write\u0000and optimize. We introduce CoolerSpace to facilitate physically correct and\u0000computationally efficient color programming. CoolerSpace raises the level of\u0000abstraction of color programming by allowing programmers to focus on describing\u0000the logic of color physics. Correctness and efficiency are handled by\u0000CoolerSpace. The type system in CoolerSpace assigns physical meaning and\u0000dimensions to user-defined objects. The typing rules permit only legal\u0000computations informed by color physics and perception. Along with type\u0000checking, CoolerSpace also generates performance-optimized programs using\u0000equality saturation. CoolerSpace is implemented as a Python library and\u0000compiles to ONNX, a common intermediate representation for tensor computations.\u0000CoolerSpace not only prevents common errors in color programming, but also does\u0000so without run-time overhead: even unoptimized CoolerSpace programs out-perform\u0000existing Python-based color programming systems by up to 5.7 times; our\u0000optimizations provide up to an additional 1.4 times speed-up.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sharing Analysis in the Pawns Compiler 卒子编译器中的分析共享

arXiv - CS - Programming Languages

Pub Date : 2024-09-04 DOI: arxiv-2409.02398

Lee Naish

Pawns is a programming language under development that supports algebraicdata types, polymorphism, higher order functions and "pure" declarativeprogramming. It also supports impure imperative features including destructiveupdate of shared data structures via pointers, allowing significantly increasedefficiency for some operations. A novelty of Pawns is that all impure "effects"must be made obvious in the source code and they can be safely encapsulated inpure functions in a way that is checked by the compiler. Execution of a purefunction can perform destructive updates on data structures that are local toor eventually returned from the function without risking modification of thedata structures passed to the function. This paper describes the sharinganalysis which allows impurity to be encapsulated. Aspects of the analysis aresimilar to other published work, but in addition it handles explicit pointersand destructive update, higher order functions including closures and pre- andpost-conditions concerning sharing for functions.

Pawns 是一种正在开发中的编程语言，它支持代数数据类型、多态性、高阶函数和 "纯 "声明式编程。它还支持不纯命令式特性，包括通过指针对共享数据结构进行破坏性更新，从而大大提高了某些操作的效率。Pawns 的新颖之处在于，所有不纯 "效果 "必须在源代码中一目了然，并且可以通过编译器检查的方式安全地封装在纯函数中。纯函数的执行可以对函数的本地数据结构或最终返回的数据结构执行破坏性更新，而不必冒着修改传递给函数的数据结构的风险。本文介绍了允许封装杂质的共享分析。该分析的某些方面与其他已发表的工作相似，但它还处理了显式指针和破坏性更新、包括闭包在内的高阶函数以及有关函数共享的前置和后置条件。

引用次数: 0

BinSub: The Simple Essence of Polymorphic Type Inference for Machine Code BinSub：机器码多态类型推断的简单精髓

arXiv - CS - Programming Languages

Pub Date : 2024-09-03 DOI: arxiv-2409.01841

Ian Smith

Recovering high-level type information in binaries is a key task in reverseengineering and binary analysis. Binaries contain very little explicit typeinformation. The structure of binary code is incredibly flexible allowing forad-hoc subtyping and polymorphism. Prior work has shown that precise typeinference on binary code requires expressive subtyping and polymorphism. Implementations of these type system features in a binary type inferencealgorithm have thus-far been too inefficient to achieve widespread adoption.Recent advances in traditional type inference have achieved simple andefficient principal type inference in an ML like language with subtyping andpolymorphism through the framework of algebraic subtyping. BinSub, a new binarytype inference algorithm, recognizes the connection between algebraic subtypingand the type system features required to analyze binaries effectively. Usingthis connection, BinSub achieves simple, precise, and efficient binary typeinference. We show that BinSub maintains a similar precision to prior work,while achieving a 63x improvement in average runtime for 1568 functions. Wealso present a formalization of BinSub and show that BinSub's type systemmaintains the expressiveness of prior work.

恢复二进制文件中的高级类型信息是逆向工程和二进制分析中的一项关键任务。二进制文件几乎不包含显式类型信息。二进制代码的结构非常灵活，允许临时的子类型和多态性。先前的工作表明，要对二进制代码进行精确的类型推断，就需要具有表现力的子类型和多态性。传统类型推断的最新进展是通过代数子类型框架，在具有子类型和多态性的类似 ML 语言中实现了简单高效的主类型推断。BinSub 是一种新的二进制类型推断算法，它认识到代数子类型和有效分析二进制所需的类型系统特征之间的联系。利用这种联系，BinSub 实现了简单、精确和高效的二进制类型推断。我们的研究表明，BinSub 与之前的工作保持了类似的精度，同时在 1568 个函数的平均运行时间上提高了 63 倍。我们还介绍了 BinSub 的形式化，并证明 BinSub 的类型系统保持了之前工作的表现力。

{"title":"BinSub: The Simple Essence of Polymorphic Type Inference for Machine Code","authors":"Ian Smith","doi":"arxiv-2409.01841","DOIUrl":"https://doi.org/arxiv-2409.01841","url":null,"abstract":"Recovering high-level type information in binaries is a key task in reverse\u0000engineering and binary analysis. Binaries contain very little explicit type\u0000information. The structure of binary code is incredibly flexible allowing for\u0000ad-hoc subtyping and polymorphism. Prior work has shown that precise type\u0000inference on binary code requires expressive subtyping and polymorphism. Implementations of these type system features in a binary type inference\u0000algorithm have thus-far been too inefficient to achieve widespread adoption.\u0000Recent advances in traditional type inference have achieved simple and\u0000efficient principal type inference in an ML like language with subtyping and\u0000polymorphism through the framework of algebraic subtyping. BinSub, a new binary\u0000type inference algorithm, recognizes the connection between algebraic subtyping\u0000and the type system features required to analyze binaries effectively. Using\u0000this connection, BinSub achieves simple, precise, and efficient binary type\u0000inference. We show that BinSub maintains a similar precision to prior work,\u0000while achieving a 63x improvement in average runtime for 1568 functions. We\u0000also present a formalization of BinSub and show that BinSub's type system\u0000maintains the expressiveness of prior work.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Array Intermediate Language for Mixed Cryptography 混合密码学阵列中间语言

arXiv - CS - Programming Languages

Pub Date : 2024-09-03 DOI: arxiv-2409.01587

Vivian Ding, Coşku Acay, Andrew C. Myers

We introduce AIRduct, a new array-based intermediate representation designedto support generating efficient code for interactive programs employingmultiple cryptographic mechanisms. AIRduct is intended as an IR for the Viaductcompiler, which can synthesize secure, distributed programs with an extensiblesuite of cryptography. Therefore, AIRduct supports an extensible variety ofcryptographic mechanisms, including MPC and ZKP.

我们介绍了 AIRduct，这是一种新的基于数组的中间表示法，旨在支持为采用多种加密机制的交互式程序生成高效代码。AIRduct 是 Viaductcompiler 的中间表示，它可以综合安全的分布式程序和扩展的密码学套件。因此，AIRduct 支持多种可扩展的加密机制，包括 MPC 和 ZKP。

引用次数: 0

The ART of Sharing Points-to Analysis (Extended Abstract) 分享点到分析的艺术（扩展摘要）

arXiv - CS - Programming Languages

Pub Date : 2024-09-03 DOI: arxiv-2409.09062

Shashin Halalingaiah, Vijay Sundaresan, Daryl Maier, V. Krishna Nandivada

Data-flow analyses like points-to analysis can vastly improve the precisionof other analyses, and help perform powerful code optimizations. However,whole-program points-to analysis of large programs tend to be expensive - bothin terms of time and memory. Consequently, many compilers (both static and JIT)and program-analysis tools tend to employ faster - but more conservative -points-to analysis to improve usability. As an alternative to such trading ofprecision for performance, various techniques have been proposed to performprecise yet expensive fixed-point points-to analyses ahead of time in a staticanalyzer, store the results, and then transmit them to independentcompilation/program-analysis stages that may need them. However, an underlyingconcern of safety affects all such techniques - can a compiler (or programanalysis tool) trust the points-to analysis results generated by anothercompiler/tool? In this work, we address this issue of trust, while keeping the issues ofperformance efficiency in mind. We propose ART: Analysis-results RepresentationTemplate - a novel scheme to efficiently and concisely encode results offlow-sensitive, context-insensitive points-to analysis computed by a staticanalyzer for use in any independent system that may benefit from such a highlyprecise points-to analysis. Our scheme has two components: (i) a producer thatcan statically perform expensive points-to analysis and encode the sameconcisely. (ii) a consumer that, on receiving such encoded results, canregenerate the points-to analysis results encoded by the artwork if it isdeemed safe. We demonstrate the usage of ART by implementing a producer (inSoot) and two consumers (in Soot and the Eclipse OpenJ9 JIT compiler). Weevaluate our implementation over various benchmarks from the DaCapo andSPECjvm2008 suites.

数据流分析（如点到分析）可以大大提高其他分析的精度，并有助于执行强大的代码优化。然而，对大型程序进行全程序点到点分析往往耗费大量时间和内存。因此，许多编译器（包括静态编译器和 JIT 编译器）和程序分析工具倾向于采用更快但更保守的点对点分析，以提高可用性。为了替代这种以精度换性能的做法，人们提出了各种技术，在静态分析器中提前执行精确但昂贵的定点到分析，存储结果，然后将其传输到可能需要它们的独立编译/程序分析阶段。然而，安全问题是影响所有此类技术的根本问题--编译器（或程序分析工具）能否信任另一个编译器/工具生成的点到分析结果？在这项工作中，我们在考虑性能效率问题的同时，解决了信任问题。我们提出了 ART：分析结果表示模板（Analysis-results RepresentationTemplate）--一种新颖的方案，用于高效、简洁地编码由静态分析器计算的低敏感、上下文不敏感的点对分析结果，供任何可能受益于这种高度精确的点对分析的独立系统使用。我们的方案由两部分组成(i) 生产者可以静态执行昂贵的点对分析，并对其进行精确编码。(ii) 消费者在接收到这种编码结果后，如果认为安全，可以生成艺术品编码的点对分析结果。我们通过实现一个生产者（在 Soot 中）和两个消费者（在 Soot 和 Eclipse OpenJ9 JIT 编译器中）来演示 ART 的用法。我们通过 DaCapo 和SPECjvm2008 套件中的各种基准对我们的实现进行了评估。

{"title":"The ART of Sharing Points-to Analysis (Extended Abstract)","authors":"Shashin Halalingaiah, Vijay Sundaresan, Daryl Maier, V. Krishna Nandivada","doi":"arxiv-2409.09062","DOIUrl":"https://doi.org/arxiv-2409.09062","url":null,"abstract":"Data-flow analyses like points-to analysis can vastly improve the precision\u0000of other analyses, and help perform powerful code optimizations. However,\u0000whole-program points-to analysis of large programs tend to be expensive - both\u0000in terms of time and memory. Consequently, many compilers (both static and JIT)\u0000and program-analysis tools tend to employ faster - but more conservative -\u0000points-to analysis to improve usability. As an alternative to such trading of\u0000precision for performance, various techniques have been proposed to perform\u0000precise yet expensive fixed-point points-to analyses ahead of time in a static\u0000analyzer, store the results, and then transmit them to independent\u0000compilation/program-analysis stages that may need them. However, an underlying\u0000concern of safety affects all such techniques - can a compiler (or program\u0000analysis tool) trust the points-to analysis results generated by another\u0000compiler/tool? In this work, we address this issue of trust, while keeping the issues of\u0000performance efficiency in mind. We propose ART: Analysis-results Representation\u0000Template - a novel scheme to efficiently and concisely encode results of\u0000flow-sensitive, context-insensitive points-to analysis computed by a static\u0000analyzer for use in any independent system that may benefit from such a highly\u0000precise points-to analysis. Our scheme has two components: (i) a producer that\u0000can statically perform expensive points-to analysis and encode the same\u0000concisely. (ii) a consumer that, on receiving such encoded results, can\u0000regenerate the points-to analysis results encoded by the artwork if it is\u0000deemed safe. We demonstrate the usage of ART by implementing a producer (in\u0000Soot) and two consumers (in Soot and the Eclipse OpenJ9 JIT compiler). We\u0000evaluate our implementation over various benchmarks from the DaCapo and\u0000SPECjvm2008 suites.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Programming Languages

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀