Proceedings of the Second International Conference on AI-ML Systems最新文献

英文中文

Malware Analysis and Detection 恶意软件分析与检测

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564809

Hemant Rathore, Mohit Sewak

Often computer/mobile users call everything that disturbs/corrupts their system a VIRUS without being aware of what it means or accomplishes. This tutorial systematically introduces the different malware varieties, their distinctive properties, different methods of analyzing the malware, and their detection techniques.

通常，电脑/手机用户把任何扰乱/破坏他们系统的东西都称为病毒，而不知道它的意思或目的。本教程系统地介绍了不同的恶意软件种类、它们的独特属性、分析恶意软件的不同方法以及它们的检测技术。

引用次数: 3

Performance improvement of reinforcement learning algorithms for online 3D bin packing using FPGA 基于FPGA的在线三维装箱强化学习算法的性能改进

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564795

Kavya Borra, Ashwin Krishnan, H. Khadilkar, M. Nambiar, Ansuma Basumatary, Rekha Singhal, A. Mukherjee

Online 3D bin packing is a challenging real-time combinatorial optimisation problem that involves packing of parcels (typically rigid cuboids) arriving on a conveyor into a larger bin for further shipment. Recent automation methods have introduced manipulator robots for packing, which need a processing algorithm to specify the location and orientation in which each parcel must be loaded. Value-based Reinforcement learning (RL) algorithms such as DQN are capable of producing good solutions in the available computation times. However, their deployment on CPU based systems employs rule-based heuristics to reduce the search space which may lead to a sub-optimal solution. In this paper, we use FPGA as a hardware accelerator to reduce inference time of DQN as well as its pre-/post-processing steps. This allows the optimised algorithm to cover the entire search space within the given time constraints. We present various optimizations, such as accelerating DQN model inference and fast checking of constraints. Further, we show that our proposed architecture achieves almost 15x computational speed-ups compared to an equivalent CPU implementation. Additionally, we show that as a result of evaluating the entire search space, the DQN rewards generated for complex data sets has improved by 1%, which can cause a significant reduction in enterprise operating costs.

在线3D装箱是一个具有挑战性的实时组合优化问题，涉及将包裹(通常是刚性长方体)包装到传送带上的更大的箱子中以供进一步运输。最近的自动化方法引入了机械手机器人进行包装，这需要一个处理算法来指定每个包裹必须装载的位置和方向。基于值的强化学习(RL)算法，如DQN，能够在可用的计算时间内产生良好的解决方案。然而，它们在基于CPU的系统上的部署采用基于规则的启发式方法来减少可能导致次优解决方案的搜索空间。在本文中，我们使用FPGA作为硬件加速器来减少DQN的推理时间和预处理/后处理步骤。这允许优化算法在给定的时间限制内覆盖整个搜索空间。我们提出了各种优化方法，如加速DQN模型推理和快速检查约束。此外，我们表明，与同等的CPU实现相比，我们提出的架构实现了几乎15倍的计算速度提升。此外，我们表明，作为评估整个搜索空间的结果，为复杂数据集生成的DQN奖励提高了1%，这可以显著降低企业运营成本。

{"title":"Performance improvement of reinforcement learning algorithms for online 3D bin packing using FPGA","authors":"Kavya Borra, Ashwin Krishnan, H. Khadilkar, M. Nambiar, Ansuma Basumatary, Rekha Singhal, A. Mukherjee","doi":"10.1145/3564121.3564795","DOIUrl":"https://doi.org/10.1145/3564121.3564795","url":null,"abstract":"Online 3D bin packing is a challenging real-time combinatorial optimisation problem that involves packing of parcels (typically rigid cuboids) arriving on a conveyor into a larger bin for further shipment. Recent automation methods have introduced manipulator robots for packing, which need a processing algorithm to specify the location and orientation in which each parcel must be loaded. Value-based Reinforcement learning (RL) algorithms such as DQN are capable of producing good solutions in the available computation times. However, their deployment on CPU based systems employs rule-based heuristics to reduce the search space which may lead to a sub-optimal solution. In this paper, we use FPGA as a hardware accelerator to reduce inference time of DQN as well as its pre-/post-processing steps. This allows the optimised algorithm to cover the entire search space within the given time constraints. We present various optimizations, such as accelerating DQN model inference and fast checking of constraints. Further, we show that our proposed architecture achieves almost 15x computational speed-ups compared to an equivalent CPU implementation. Additionally, we show that as a result of evaluating the entire search space, the DQN rewards generated for complex data sets has improved by 1%, which can cause a significant reduction in enterprise operating costs.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122765541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Causal Dependencies in Multivariate Time Series 多元时间序列中因果关系的识别

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564810

Sujoy Roy Chowdhury, Serene Banerjee, Ranjani H. G., Chaitanya Kapoor

Telecommunications networks operate on enormous amount of time-series data, and often exhibit anomalous trends in their behaviour. This is caused due to increased latency and reduced throughput in the network which inevitably leads to poor customer experience [17]. One of the common problems in machine learning in the telecom domain is to predict anomalous behaviour ahead of time. Whilst this is a well-researched problem, there is far less work done in identifying causal structures from the temporal patterns of various Key Performance Indicators (KPI) in the telecom network. The ability to identify causal structures from anomalous behaviours would allow more effective intervention and generalisation of different environments and networks. The tutorial is focused on discussing existing frameworks for establishing causal discovery for time-series data sets. In this hands-on tutorial, we will be covering at least 3 state-of-the-art (SOTA) methods on causal time series analysis including Granger causality[8],convergent cross-mapping [4, 10, 15], Peter-Clark Momentary Conditional Independence (PC-MCI) [6, 14] and Temporal Causal discovery framework (TCDF)[11]. The need for a causation analysis[7], beyond correlation will also be explained using publicly available datasets, such as, double pendulum dataset [1]. The state-of-art methods are chosen to cover various aspects of the causal time series analysis, such as modelling the non-linearity (non-linear Granger Causality), attempting the problem from chaos and dynamic systems (CCM), information-theoretic approaches (PC-MCI, or having a data-driven approach (TCDF). State-of-the-art survey papers [2, 12] show that none of the methods can be said to be ideal for all the possible time series and there are relative advantages and shortcomings for each of these methods.

电信网络在大量的时间序列数据上运行，其行为经常表现出反常的趋势。这是由于网络中的延迟增加和吞吐量降低而导致的，这不可避免地会导致糟糕的客户体验[17]。电信领域机器学习的常见问题之一是提前预测异常行为。虽然这是一个研究得很好的问题，但在从电信网络中各种关键绩效指标(KPI)的时间模式中识别因果结构方面所做的工作要少得多。从异常行为中识别因果结构的能力将允许对不同的环境和网络进行更有效的干预和概括。本教程的重点是讨论为时间序列数据集建立因果发现的现有框架。在本实践教程中，我们将涵盖至少3种最先进的(SOTA)因果时间序列分析方法，包括格兰杰因果关系[8]，收敛交叉映射[4,10,15]，彼得-克拉克瞬间条件独立(PC-MCI)[6,14]和时间因果发现框架(TCDF)[11]。除了相关性之外，对因果分析[7]的需求也将使用公开可用的数据集进行解释，例如，双摆数据集[1]。选择最先进的方法来涵盖因果时间序列分析的各个方面，例如非线性建模(非线性格兰杰因果关系)，尝试从混沌和动态系统(CCM)，信息理论方法(PC-MCI)或数据驱动方法(TCDF)中解决问题。最新的调查论文[2,12]表明，没有一种方法可以说对所有可能的时间序列都是理想的，每种方法都有相对的优点和缺点。

{"title":"Identification of Causal Dependencies in Multivariate Time Series","authors":"Sujoy Roy Chowdhury, Serene Banerjee, Ranjani H. G., Chaitanya Kapoor","doi":"10.1145/3564121.3564810","DOIUrl":"https://doi.org/10.1145/3564121.3564810","url":null,"abstract":"Telecommunications networks operate on enormous amount of time-series data, and often exhibit anomalous trends in their behaviour. This is caused due to increased latency and reduced throughput in the network which inevitably leads to poor customer experience [17]. One of the common problems in machine learning in the telecom domain is to predict anomalous behaviour ahead of time. Whilst this is a well-researched problem, there is far less work done in identifying causal structures from the temporal patterns of various Key Performance Indicators (KPI) in the telecom network. The ability to identify causal structures from anomalous behaviours would allow more effective intervention and generalisation of different environments and networks. The tutorial is focused on discussing existing frameworks for establishing causal discovery for time-series data sets. In this hands-on tutorial, we will be covering at least 3 state-of-the-art (SOTA) methods on causal time series analysis including Granger causality[8],convergent cross-mapping [4, 10, 15], Peter-Clark Momentary Conditional Independence (PC-MCI) [6, 14] and Temporal Causal discovery framework (TCDF)[11]. The need for a causation analysis[7], beyond correlation will also be explained using publicly available datasets, such as, double pendulum dataset [1]. The state-of-art methods are chosen to cover various aspects of the causal time series analysis, such as modelling the non-linearity (non-linear Granger Causality), attempting the problem from chaos and dynamic systems (CCM), information-theoretic approaches (PC-MCI, or having a data-driven approach (TCDF). State-of-the-art survey papers [2, 12] show that none of the methods can be said to be ideal for all the possible time series and there are relative advantages and shortcomings for each of these methods.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"410 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115742466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RIDEN: Neural-based Uniform Density Histogram for Distribution Shift Detection 基于神经的均匀密度直方图分布偏移检测

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564136

Kei Yonekawa, Kazuhiro Saito, Mori Kurokawa

It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.

它需要检测分布变化，以防止机器学习模型性能下降，并防止人为介导的数据分析得出错误的结论。为了比较高维数据的未知分布，直方图由于其计算效率是合适的密度估计。对于分布位移检测的直方图来说，具有均匀的密度是很重要的，这已经在现有的基于树或基于簇的直方图中得到了证明。然而，现有的直方图没有考虑对样本外数据的泛化能力，导致测试时的检测性能下降。在本文中，我们提出了一种基于神经直方图的分布位移检测，它可以很好地推广到样本外数据。直方图的箱子是由一个模型来决定的，这个模型被训练来区分少数参考实例，这反映了它们的潜在分布。由于从自举样本计算的批量最大熵正则化器，bin作为由模型的决策边界划分的特征空间的子集进行泛化，因此直方图对于样本外数据保持其密度均匀。我们使用多域真实数据集来评估我们的方法在分布偏移检测任务上的性能。结果表明，我们的方法优于最先进的基于直方图的方法。

{"title":"RIDEN: Neural-based Uniform Density Histogram for Distribution Shift Detection","authors":"Kei Yonekawa, Kazuhiro Saito, Mori Kurokawa","doi":"10.1145/3564121.3564136","DOIUrl":"https://doi.org/10.1145/3564121.3564136","url":null,"abstract":"It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115904571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unposed: Unsupervised Pose Estimation based Product Image Recommendations Unposed:基于产品图像推荐的无监督姿态估计

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564126

Saurabh Sharma, Faizan Ahemad

Product images are the most impressing medium of customer interaction on the product detail pages of e-commerce websites. Millions of products are onboarded on to webstore catalogues daily and maintaining a high quality bar for a product’s set of images is a problem at scale. Grouping products by categories, clothing is a very high volume and high velocity category and thus deserves its own attention. Given the scale it is challenging to monitor the completeness of image set, which adequately details the product for the consumers, which in turn often leads to a poor customer experience and thus customer drop off. To supervise the quality and completeness of the images in the product pages for these product types and suggest improvements, we propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We first create a reference image set of popular products with wholesome imageset. Then we create clusters of images to label most desirable poses to form the classes for the reference set from these ideal products set. Further, for all test products we scan the images for all desired pose classes w.r.t. reference set poses, determine the missing ones and sort them in the order of potential impact. These missing poses can further be used by the sellers to add enriched product listing image. We gathered data from popular online webstore and surveyed ~200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(~20K images) of which a significant proportion had scope for adding many image variants as compared to high rated products which had more than double image variants, indicating that our model can potentially be used on a large scale.

在电子商务网站的产品详情页面中，产品图片是客户互动最重要的媒介。每天都有数以百万计的产品出现在网上商店的目录中，在规模上维护一个高质量的产品图片栏是一个问题。按品类分类，服装是一个体量大、速度快的品类，值得关注。考虑到规模，监控图像集的完整性是具有挑战性的，这些图像集为消费者提供了充分的产品细节，这反过来往往会导致糟糕的客户体验，从而导致客户流失。为了监督这些产品类型的产品页面中图像的质量和完整性并提出改进建议，我们提出了一种基于人体姿态检测的无监督方法来扫描产品图像集以寻找缺失的图像。无监督的方法建议对卖家采取一种基于产品和类别的公平方法，而不考虑任何偏见。我们首先用健康的图像集创建一个流行产品的参考图像集。然后，我们创建图像簇来标记最理想的姿势，从而从这些理想产品集中形成参考集的类。此外，对于所有测试产品，我们扫描图像中所有所需的姿势类w.r.t.参考集姿势，确定缺失的姿势并按照潜在影响的顺序对它们进行排序。这些缺失的姿势可以进一步被卖家用来添加丰富的产品清单图像。我们从流行的在线商店收集数据，并手动调查了约200种产品，其中很大一部分至少有1个重复图像或缺失变体，并抽样了3K种产品(约20K图像)，其中很大一部分与具有两倍以上图像变体的高评级产品相比，具有添加许多图像变体的范围，这表明我们的模型可以大规模使用。

{"title":"Unposed: Unsupervised Pose Estimation based Product Image Recommendations","authors":"Saurabh Sharma, Faizan Ahemad","doi":"10.1145/3564121.3564126","DOIUrl":"https://doi.org/10.1145/3564121.3564126","url":null,"abstract":"Product images are the most impressing medium of customer interaction on the product detail pages of e-commerce websites. Millions of products are onboarded on to webstore catalogues daily and maintaining a high quality bar for a product’s set of images is a problem at scale. Grouping products by categories, clothing is a very high volume and high velocity category and thus deserves its own attention. Given the scale it is challenging to monitor the completeness of image set, which adequately details the product for the consumers, which in turn often leads to a poor customer experience and thus customer drop off. To supervise the quality and completeness of the images in the product pages for these product types and suggest improvements, we propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We first create a reference image set of popular products with wholesome imageset. Then we create clusters of images to label most desirable poses to form the classes for the reference set from these ideal products set. Further, for all test products we scan the images for all desired pose classes w.r.t. reference set poses, determine the missing ones and sort them in the order of potential impact. These missing poses can further be used by the sellers to add enriched product listing image. We gathered data from popular online webstore and surveyed ~200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(~20K images) of which a significant proportion had scope for adding many image variants as compared to high rated products which had more than double image variants, indicating that our model can potentially be used on a large scale.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Health Assurance: AI Model Monitoring Platform 健康保障:AI模型监测平台

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564798

Anirban I Ghosh, Radhika Sharma, Karan Goyal, Balakarthikeyan Rajan, S. Mani

Businesses are increasingly reliant on Machine Learning models to manage user experiences. It becomes important to not only focus on building robust and state-of-the-art models but also continuously monitor and evaluate them. Continuous monitoring enables the AI team to ensure the right frequency of model training and pro-actively investigate erroneous patterns and predictions, before it has a wider business impact. A robust and effective monitoring system is thus needed to ensure business and engineering teams are aware of model performance and any data anomalies which could impact downstream model accuracy. In this paper, we present our Health Assurance model monitoring solution. Currently, the system serves the health monitoring needs of more than 250 models across 11 AI verticals with an average anomaly detection precision of 60%.

企业越来越依赖机器学习模型来管理用户体验。重要的是，不仅要专注于构建健壮的、最先进的模型，还要持续地监测和评估它们。持续监控使人工智能团队能够确保正确的模型训练频率，并在其产生更广泛的业务影响之前主动调查错误的模式和预测。因此，需要一个强大而有效的监控系统来确保业务和工程团队了解模型性能和任何可能影响下游模型准确性的数据异常。在本文中，我们提出了我们的健康保障模型监测解决方案。目前，该系统服务于11个AI垂直领域的250多个模型的健康监测需求，平均异常检测精度为60%。

引用次数: 0

DEC-aided SM-OFDM: A Spatial Modulation System with Deep Learning based Error Correction dec辅助SM-OFDM:一种基于深度学习纠错的空间调制系统

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564131

H. Verma, V. Bohara, Anubha Gupta

In this work, we propose a Deep Learning (DL) based error correction system termed as DEC. It predicts the transmitted symbols at the receiver using the received soft symbols and channel state information (CSI) of the transmission link. Hence, the proposed system eliminates the need of using complex channel coding/decoding blocks in the wireless communication system. Specifically, we explore the application of proposed DEC system for Spatial Modulation-OFDM (SM-OFDM) systems. SM is a technique that avoids inter-channel interference (ICI) at receiver input, also offers a good balance between the energy and spectral efficiency. This together with DEC system can prove to be of interest for the next generation wireless system, particularly for the Internet-of-Things (IoT) devices that require optimal bit-error ratios (BER) at moderate data rates. The performance of the proposed system is compared with Trellis coded-SM (TCSM) system. The obtained simulation results successfully verify the superiority of the DEC-aided SM-OFDM system over the TCSM in terms of both BER and throughput.

在这项工作中，我们提出了一个基于深度学习(DL)的纠错系统，称为dec，它使用接收到的软符号和传输链路的信道状态信息(CSI)来预测接收器上的传输符号。因此，所提出的系统消除了在无线通信系统中使用复杂信道编码/解码块的需要。具体来说，我们探讨了所提出的DEC系统在空间调制ofdm (SM-OFDM)系统中的应用。SM是一种在接收端避免信道间干扰(ICI)的技术，同时也提供了能量和频谱效率之间的良好平衡。这与DEC系统一起可以被证明是下一代无线系统的兴趣，特别是对于在中等数据速率下需要最佳误码率(BER)的物联网(IoT)设备。将该系统的性能与Trellis编码- sm (TCSM)系统进行了比较。仿真结果成功地验证了decc辅助的SM-OFDM系统在误码率和吞吐量方面优于TCSM系统。

引用次数: 0

Automated Deep Learning Model Partitioning for Heterogeneous Edge Devices 异构边缘设备的自动深度学习模型划分

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564796

Arijit Mukherjee, Swarnava Dey

Deep Neural Networks (DNN) have made machine learning accessible to a wide set of practitioners working with field deployment of analytics algorithms over sensor data. Along with it, focus on data privacy, low latency inference, and sustainability has highlighted the need for efficient in-situ analytics close to sensors, at the edge of the network, which is challenging given the constrained nature of the edge platforms, including Common Off-the-Shelf (COTS) AI accelerators. Efficient DNN model partitioning across multiple edge nodes is a well-studied approach, but no definitive characterization exists as to why there is a performance improvement due to DNN model partitioning, and whether the benefits hold for currently used edge hardware & state-of-the-art DNN models. In this paper, we present a detailed study and analyses to address the above-mentioned shortcomings and propose a framework that automatically determines the best partitioning scheme and enhances system efficiency.

深度神经网络(DNN)使机器学习能够被广泛的实践者使用，这些实践者使用传感器数据的分析算法进行现场部署。与此同时，对数据隐私、低延迟推断和可持续性的关注凸显了在网络边缘靠近传感器的高效现场分析的需求，考虑到边缘平台(包括通用现货(COTS) AI加速器)的局限性，这是一项挑战。跨多个边缘节点的高效DNN模型划分是一种得到充分研究的方法，但是关于为什么DNN模型划分会提高性能，以及这种好处是否适用于当前使用的边缘硬件和最先进的DNN模型，目前还没有明确的描述。本文针对上述缺点进行了详细的研究和分析，并提出了一个自动确定最佳分区方案和提高系统效率的框架。

引用次数: 1

Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations Hetero-Rec:高速推荐的嵌入优化部署

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564134

Chinmay Mahajan, Ashwin Krishnan, M. Nambiar, Rekha Singhal

We see two trends emerging due to exponential increase in AI research- rise in adoption of AI based models in enterprise applications and development of different types of hardware accelerators with varying memory and computing architectures for accelerating AI workloads. Accelerators may have different types of memories, varying on access latency and storage capacity. A recommendation model’s inference latency is highly influenced by the time to fetch embeddings from the embedding tables. In this paper, we present Hetero-Rec, a framework for optimal deployment of embeddings for faster inference of recommendation model. The main idea is to cache frequently accessed embeddings on faster memories to reduce average latency during inference. Hetero-Rec uses performance model-based optimization algorithm and use of spline based learned index for determining the optimal reservation of portions of embedding tables across different memory types available for deployment, based on their past access patterns. We validate our approach for heterogeneous memory architectures, such as URAM (Ultra-Random Access Memory), BRAM (Block Random Access Memory), HBM (High-Bandwidth Memory) and DDR (Double Data Rate) on a server platform with an FPGA accelerator. We observe that the presented optimization algorithm for dynamic placement of embedding tables yields a reduction on average latency of up to 1.52x, 1.68x, and 2.91x for the weekly, daily, and hourly access patterns, respectively in the transaction history as compared to the state-of-the-art systems.

由于人工智能研究呈指数级增长，我们看到了两种趋势——在企业应用程序中采用基于人工智能的模型的增加，以及开发具有不同内存和计算架构的不同类型硬件加速器，以加速人工智能工作负载。加速器可能具有不同类型的存储器，在访问延迟和存储容量上有所不同。从嵌入表中提取嵌入的时间对推荐模型的推理延迟有很大影响。在本文中，我们提出了Hetero-Rec框架，用于优化嵌入部署以更快地推断推荐模型。主要思想是在更快的内存上缓存频繁访问的嵌入，以减少推理期间的平均延迟。Hetero-Rec使用基于性能模型的优化算法，并使用基于样条的学习索引，根据不同内存类型的过去访问模式，确定可部署的嵌入表部分的最佳保留。我们验证了我们的方法异构内存架构，如URAM(超随机存取内存)，BRAM(块随机存取内存)，HBM(高带宽内存)和DDR(双数据速率)在服务器平台上与FPGA加速器。我们观察到，与最先进的系统相比，所提出的用于动态放置嵌入表的优化算法在事务历史中，对于每周、每日和每小时访问模式，平均延迟分别减少了1.52倍、1.68倍和2.91倍。

{"title":"Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations","authors":"Chinmay Mahajan, Ashwin Krishnan, M. Nambiar, Rekha Singhal","doi":"10.1145/3564121.3564134","DOIUrl":"https://doi.org/10.1145/3564121.3564134","url":null,"abstract":"We see two trends emerging due to exponential increase in AI research- rise in adoption of AI based models in enterprise applications and development of different types of hardware accelerators with varying memory and computing architectures for accelerating AI workloads. Accelerators may have different types of memories, varying on access latency and storage capacity. A recommendation model’s inference latency is highly influenced by the time to fetch embeddings from the embedding tables. In this paper, we present Hetero-Rec, a framework for optimal deployment of embeddings for faster inference of recommendation model. The main idea is to cache frequently accessed embeddings on faster memories to reduce average latency during inference. Hetero-Rec uses performance model-based optimization algorithm and use of spline based learned index for determining the optimal reservation of portions of embedding tables across different memory types available for deployment, based on their past access patterns. We validate our approach for heterogeneous memory architectures, such as URAM (Ultra-Random Access Memory), BRAM (Block Random Access Memory), HBM (High-Bandwidth Memory) and DDR (Double Data Rate) on a server platform with an FPGA accelerator. We observe that the presented optimization algorithm for dynamic placement of embedding tables yields a reduction on average latency of up to 1.52x, 1.68x, and 2.91x for the weekly, daily, and hourly access patterns, respectively in the transaction history as compared to the state-of-the-art systems.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"66 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133488632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TinyML Techniques for running Machine Learning models on Edge Devices 在边缘设备上运行机器学习模型的TinyML技术

Proceedings of the Second International Conference on AI-ML Systems

Pub Date : 2022-10-12 DOI: 10.1145/3564121.3564812

Arijit Mukherjee, A. Ukil, Swarnava Dey, Gitesh Kulkarni

Resource-constrained platforms such as micro-controllers are the workhorses in embedded systems, being deployed to capture data from sensors and send the collected data to cloud for processing. Recently, a great interest is seen in the research community and industry to use these devices for performing Artificial Intelligence/Machine Learning (AI/ML) inference tasks in the areas of computer vision, natural language processing, machine monitoring etc. leading to the realization of embedded intelligence at the edge. This task is challenging and needs a significant knowledge of AI/ML applications, algorithms, and computer architecture and their interactions to achieve the desired performance. In this tutorial we cover a few aspects that will help embedded systems designers and AI/ML engineers and scientists to deploy the AI/ML models on the Tiny Edge Devices at an optimum level of performance.

资源有限的平台，如微控制器，是嵌入式系统的主力，用于从传感器捕获数据，并将收集到的数据发送到云端进行处理。最近，研究界和工业界对使用这些设备在计算机视觉、自然语言处理、机器监控等领域执行人工智能/机器学习(AI/ML)推理任务产生了极大的兴趣，从而实现了边缘的嵌入式智能。这项任务具有挑战性，需要对AI/ML应用程序、算法、计算机体系结构及其相互作用有深入的了解，才能实现预期的性能。在本教程中，我们将介绍一些方面，这些方面将帮助嵌入式系统设计师和AI/ML工程师和科学家以最佳性能水平在Tiny Edge设备上部署AI/ML模型。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Second International Conference on AI-ML Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀