A Tensor Formalism for Computer Science

2021 International Symposium on Electrical, Electronics and Information Engineering Pub Date : 2021-02-19 DOI:10.1145/3459104.3459152

Jon Bratseth, H. Pettersen, L. Solbakken

{"title":"A Tensor Formalism for Computer Science","authors":"Jon Bratseth, H. Pettersen, L. Solbakken","doi":"10.1145/3459104.3459152","DOIUrl":null,"url":null,"abstract":"Over recent years, tensors have emerged as the preferred data structure for model representation and computation in machine learning. However, current tensor models suffer from a lack of a formal basis, where the tensors are treated as arbitrary multidimensional data processed by a large and ever-growing collection of functions added ad hoc. In this way, tensor frameworks degenerate to programming languages with a curiously cumbersome data model. This paper argues that a more formal basis for tensors and their computation brings important benefits. The proposed formalism is based on 1) a strong type system for tensors with named dimensions, 2) a common model of both dense and sparse tensors, and 3) a small, closed set of tensor functions, providing a general mathematical language in which higher level functions can be expressed. These features work together to provide ease of use resulting from static type verification with meaningful dimension names, improved interoperability resulting from defining a closed set of just six foundational tensor functions, and better support for performance optimizations resulting from having just a small set of core functions needing low-level optimizations, and higher-level operations being able to work on arbitrary chunks of these functions, as well as from better mathematical properties from using named tensor dimensions without inherent order. The proposed model is implemented as the model inference engine in the Vespa big data serving engine, where it runs various models expressed in this language directly, as well as models expressed in TensorFlow or Onnx formats.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Over recent years, tensors have emerged as the preferred data structure for model representation and computation in machine learning. However, current tensor models suffer from a lack of a formal basis, where the tensors are treated as arbitrary multidimensional data processed by a large and ever-growing collection of functions added ad hoc. In this way, tensor frameworks degenerate to programming languages with a curiously cumbersome data model. This paper argues that a more formal basis for tensors and their computation brings important benefits. The proposed formalism is based on 1) a strong type system for tensors with named dimensions, 2) a common model of both dense and sparse tensors, and 3) a small, closed set of tensor functions, providing a general mathematical language in which higher level functions can be expressed. These features work together to provide ease of use resulting from static type verification with meaningful dimension names, improved interoperability resulting from defining a closed set of just six foundational tensor functions, and better support for performance optimizations resulting from having just a small set of core functions needing low-level optimizations, and higher-level operations being able to work on arbitrary chunks of these functions, as well as from better mathematical properties from using named tensor dimensions without inherent order. The proposed model is implemented as the model inference engine in the Vespa big data serving engine, where it runs various models expressed in this language directly, as well as models expressed in TensorFlow or Onnx formats.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

计算机科学的张量形式化

近年来，张量已成为机器学习中模型表示和计算的首选数据结构。然而，目前的张量模型缺乏正式的基础，其中张量被视为任意多维数据，由大量不断增长的函数集合处理。通过这种方式，张量框架退化为具有奇怪的笨重数据模型的编程语言。本文认为一个更正式的张量基础及其计算带来了重要的好处。提出的形式是基于1)具有命名维的张量的强类型系统，2)密集和稀疏张量的公共模型，以及3)一个小的，张量函数的封闭集，提供了一种通用的数学语言，其中可以表示更高级别的函数。这些特性一起工作，通过具有有意义的维度名称的静态类型验证来提供易用性，通过定义仅六个基本张量函数的封闭集来提高互操作性，并且通过只需要低级优化的一小部分核心函数集来更好地支持性能优化，并且可以在这些函数的任意块上进行高级操作。以及通过使用没有固有顺序的命名张量维来获得更好的数学性质。提出的模型在Vespa大数据服务引擎中作为模型推理引擎实现，可以直接运行用该语言表达的各种模型，也可以运行用TensorFlow或Onnx格式表达的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量