Often computer/mobile users call everything that disturbs/corrupts their system a VIRUS without being aware of what it means or accomplishes. This tutorial systematically introduces the different malware varieties, their distinctive properties, different methods of analyzing the malware, and their detection techniques.
{"title":"Malware Analysis and Detection","authors":"Hemant Rathore, Mohit Sewak","doi":"10.1145/3564121.3564809","DOIUrl":"https://doi.org/10.1145/3564121.3564809","url":null,"abstract":"Often computer/mobile users call everything that disturbs/corrupts their system a VIRUS without being aware of what it means or accomplishes. This tutorial systematically introduces the different malware varieties, their distinctive properties, different methods of analyzing the malware, and their detection techniques.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116724871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kavya Borra, Ashwin Krishnan, H. Khadilkar, M. Nambiar, Ansuma Basumatary, Rekha Singhal, A. Mukherjee
Online 3D bin packing is a challenging real-time combinatorial optimisation problem that involves packing of parcels (typically rigid cuboids) arriving on a conveyor into a larger bin for further shipment. Recent automation methods have introduced manipulator robots for packing, which need a processing algorithm to specify the location and orientation in which each parcel must be loaded. Value-based Reinforcement learning (RL) algorithms such as DQN are capable of producing good solutions in the available computation times. However, their deployment on CPU based systems employs rule-based heuristics to reduce the search space which may lead to a sub-optimal solution. In this paper, we use FPGA as a hardware accelerator to reduce inference time of DQN as well as its pre-/post-processing steps. This allows the optimised algorithm to cover the entire search space within the given time constraints. We present various optimizations, such as accelerating DQN model inference and fast checking of constraints. Further, we show that our proposed architecture achieves almost 15x computational speed-ups compared to an equivalent CPU implementation. Additionally, we show that as a result of evaluating the entire search space, the DQN rewards generated for complex data sets has improved by 1%, which can cause a significant reduction in enterprise operating costs.
{"title":"Performance improvement of reinforcement learning algorithms for online 3D bin packing using FPGA","authors":"Kavya Borra, Ashwin Krishnan, H. Khadilkar, M. Nambiar, Ansuma Basumatary, Rekha Singhal, A. Mukherjee","doi":"10.1145/3564121.3564795","DOIUrl":"https://doi.org/10.1145/3564121.3564795","url":null,"abstract":"Online 3D bin packing is a challenging real-time combinatorial optimisation problem that involves packing of parcels (typically rigid cuboids) arriving on a conveyor into a larger bin for further shipment. Recent automation methods have introduced manipulator robots for packing, which need a processing algorithm to specify the location and orientation in which each parcel must be loaded. Value-based Reinforcement learning (RL) algorithms such as DQN are capable of producing good solutions in the available computation times. However, their deployment on CPU based systems employs rule-based heuristics to reduce the search space which may lead to a sub-optimal solution. In this paper, we use FPGA as a hardware accelerator to reduce inference time of DQN as well as its pre-/post-processing steps. This allows the optimised algorithm to cover the entire search space within the given time constraints. We present various optimizations, such as accelerating DQN model inference and fast checking of constraints. Further, we show that our proposed architecture achieves almost 15x computational speed-ups compared to an equivalent CPU implementation. Additionally, we show that as a result of evaluating the entire search space, the DQN rewards generated for complex data sets has improved by 1%, which can cause a significant reduction in enterprise operating costs.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122765541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sujoy Roy Chowdhury, Serene Banerjee, Ranjani H. G., Chaitanya Kapoor
Telecommunications networks operate on enormous amount of time-series data, and often exhibit anomalous trends in their behaviour. This is caused due to increased latency and reduced throughput in the network which inevitably leads to poor customer experience [17]. One of the common problems in machine learning in the telecom domain is to predict anomalous behaviour ahead of time. Whilst this is a well-researched problem, there is far less work done in identifying causal structures from the temporal patterns of various Key Performance Indicators (KPI) in the telecom network. The ability to identify causal structures from anomalous behaviours would allow more effective intervention and generalisation of different environments and networks. The tutorial is focused on discussing existing frameworks for establishing causal discovery for time-series data sets. In this hands-on tutorial, we will be covering at least 3 state-of-the-art (SOTA) methods on causal time series analysis including Granger causality[8],convergent cross-mapping [4, 10, 15], Peter-Clark Momentary Conditional Independence (PC-MCI) [6, 14] and Temporal Causal discovery framework (TCDF)[11]. The need for a causation analysis[7], beyond correlation will also be explained using publicly available datasets, such as, double pendulum dataset [1]. The state-of-art methods are chosen to cover various aspects of the causal time series analysis, such as modelling the non-linearity (non-linear Granger Causality), attempting the problem from chaos and dynamic systems (CCM), information-theoretic approaches (PC-MCI, or having a data-driven approach (TCDF). State-of-the-art survey papers [2, 12] show that none of the methods can be said to be ideal for all the possible time series and there are relative advantages and shortcomings for each of these methods.
{"title":"Identification of Causal Dependencies in Multivariate Time Series","authors":"Sujoy Roy Chowdhury, Serene Banerjee, Ranjani H. G., Chaitanya Kapoor","doi":"10.1145/3564121.3564810","DOIUrl":"https://doi.org/10.1145/3564121.3564810","url":null,"abstract":"Telecommunications networks operate on enormous amount of time-series data, and often exhibit anomalous trends in their behaviour. This is caused due to increased latency and reduced throughput in the network which inevitably leads to poor customer experience [17]. One of the common problems in machine learning in the telecom domain is to predict anomalous behaviour ahead of time. Whilst this is a well-researched problem, there is far less work done in identifying causal structures from the temporal patterns of various Key Performance Indicators (KPI) in the telecom network. The ability to identify causal structures from anomalous behaviours would allow more effective intervention and generalisation of different environments and networks. The tutorial is focused on discussing existing frameworks for establishing causal discovery for time-series data sets. In this hands-on tutorial, we will be covering at least 3 state-of-the-art (SOTA) methods on causal time series analysis including Granger causality[8],convergent cross-mapping [4, 10, 15], Peter-Clark Momentary Conditional Independence (PC-MCI) [6, 14] and Temporal Causal discovery framework (TCDF)[11]. The need for a causation analysis[7], beyond correlation will also be explained using publicly available datasets, such as, double pendulum dataset [1]. The state-of-art methods are chosen to cover various aspects of the causal time series analysis, such as modelling the non-linearity (non-linear Granger Causality), attempting the problem from chaos and dynamic systems (CCM), information-theoretic approaches (PC-MCI, or having a data-driven approach (TCDF). State-of-the-art survey papers [2, 12] show that none of the methods can be said to be ideal for all the possible time series and there are relative advantages and shortcomings for each of these methods.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"410 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115742466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.
{"title":"RIDEN: Neural-based Uniform Density Histogram for Distribution Shift Detection","authors":"Kei Yonekawa, Kazuhiro Saito, Mori Kurokawa","doi":"10.1145/3564121.3564136","DOIUrl":"https://doi.org/10.1145/3564121.3564136","url":null,"abstract":"It is required to detect distribution shift in order to prevent a machine learning model from performance degradation, and human-mediated data analysis from erroneous conclusions. For the purpose of comparing between unknown distributions of high-dimensional data, histograms are suitable density estimators due to its computational efficiency. It is important for histograms for distribution shift detection to have uniform density, which has been demonstrated in existing tree-based or cluster-based histograms. However, existing histograms do not consider generalization capability to out-of-sample data, resulting in degraded detection performance at test time. In this paper, we propose a neural-based histogram for distribution shift detection, which generalizes well to out-of-sample data. The bins of histogram are determined by a model trained to discriminate between a handful reference instances, which reflects their underlying distribution. Due to the batch-wise maximum entropy regularizer calculated from a bootstrap sample, the bins as a subset of the feature space partitioned by the decision boundaries of the model generalize, and thus the histogram keeps its density uniform for out-of-sample data. We evaluate our method on distribution shift detection task using multi-domain real-world datasets. The results show that our method outperforms state-of-the-art histogram-based methods.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115904571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Product images are the most impressing medium of customer interaction on the product detail pages of e-commerce websites. Millions of products are onboarded on to webstore catalogues daily and maintaining a high quality bar for a product’s set of images is a problem at scale. Grouping products by categories, clothing is a very high volume and high velocity category and thus deserves its own attention. Given the scale it is challenging to monitor the completeness of image set, which adequately details the product for the consumers, which in turn often leads to a poor customer experience and thus customer drop off. To supervise the quality and completeness of the images in the product pages for these product types and suggest improvements, we propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We first create a reference image set of popular products with wholesome imageset. Then we create clusters of images to label most desirable poses to form the classes for the reference set from these ideal products set. Further, for all test products we scan the images for all desired pose classes w.r.t. reference set poses, determine the missing ones and sort them in the order of potential impact. These missing poses can further be used by the sellers to add enriched product listing image. We gathered data from popular online webstore and surveyed ~200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(~20K images) of which a significant proportion had scope for adding many image variants as compared to high rated products which had more than double image variants, indicating that our model can potentially be used on a large scale.
{"title":"Unposed: Unsupervised Pose Estimation based Product Image Recommendations","authors":"Saurabh Sharma, Faizan Ahemad","doi":"10.1145/3564121.3564126","DOIUrl":"https://doi.org/10.1145/3564121.3564126","url":null,"abstract":"Product images are the most impressing medium of customer interaction on the product detail pages of e-commerce websites. Millions of products are onboarded on to webstore catalogues daily and maintaining a high quality bar for a product’s set of images is a problem at scale. Grouping products by categories, clothing is a very high volume and high velocity category and thus deserves its own attention. Given the scale it is challenging to monitor the completeness of image set, which adequately details the product for the consumers, which in turn often leads to a poor customer experience and thus customer drop off. To supervise the quality and completeness of the images in the product pages for these product types and suggest improvements, we propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones. The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases. We first create a reference image set of popular products with wholesome imageset. Then we create clusters of images to label most desirable poses to form the classes for the reference set from these ideal products set. Further, for all test products we scan the images for all desired pose classes w.r.t. reference set poses, determine the missing ones and sort them in the order of potential impact. These missing poses can further be used by the sellers to add enriched product listing image. We gathered data from popular online webstore and surveyed ~200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(~20K images) of which a significant proportion had scope for adding many image variants as compared to high rated products which had more than double image variants, indicating that our model can potentially be used on a large scale.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anirban I Ghosh, Radhika Sharma, Karan Goyal, Balakarthikeyan Rajan, S. Mani
Businesses are increasingly reliant on Machine Learning models to manage user experiences. It becomes important to not only focus on building robust and state-of-the-art models but also continuously monitor and evaluate them. Continuous monitoring enables the AI team to ensure the right frequency of model training and pro-actively investigate erroneous patterns and predictions, before it has a wider business impact. A robust and effective monitoring system is thus needed to ensure business and engineering teams are aware of model performance and any data anomalies which could impact downstream model accuracy. In this paper, we present our Health Assurance model monitoring solution. Currently, the system serves the health monitoring needs of more than 250 models across 11 AI verticals with an average anomaly detection precision of 60%.
{"title":"Health Assurance: AI Model Monitoring Platform","authors":"Anirban I Ghosh, Radhika Sharma, Karan Goyal, Balakarthikeyan Rajan, S. Mani","doi":"10.1145/3564121.3564798","DOIUrl":"https://doi.org/10.1145/3564121.3564798","url":null,"abstract":"Businesses are increasingly reliant on Machine Learning models to manage user experiences. It becomes important to not only focus on building robust and state-of-the-art models but also continuously monitor and evaluate them. Continuous monitoring enables the AI team to ensure the right frequency of model training and pro-actively investigate erroneous patterns and predictions, before it has a wider business impact. A robust and effective monitoring system is thus needed to ensure business and engineering teams are aware of model performance and any data anomalies which could impact downstream model accuracy. In this paper, we present our Health Assurance model monitoring solution. Currently, the system serves the health monitoring needs of more than 250 models across 11 AI verticals with an average anomaly detection precision of 60%.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117314480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we propose a Deep Learning (DL) based error correction system termed as DEC. It predicts the transmitted symbols at the receiver using the received soft symbols and channel state information (CSI) of the transmission link. Hence, the proposed system eliminates the need of using complex channel coding/decoding blocks in the wireless communication system. Specifically, we explore the application of proposed DEC system for Spatial Modulation-OFDM (SM-OFDM) systems. SM is a technique that avoids inter-channel interference (ICI) at receiver input, also offers a good balance between the energy and spectral efficiency. This together with DEC system can prove to be of interest for the next generation wireless system, particularly for the Internet-of-Things (IoT) devices that require optimal bit-error ratios (BER) at moderate data rates. The performance of the proposed system is compared with Trellis coded-SM (TCSM) system. The obtained simulation results successfully verify the superiority of the DEC-aided SM-OFDM system over the TCSM in terms of both BER and throughput.
在这项工作中,我们提出了一个基于深度学习(DL)的纠错系统,称为dec,它使用接收到的软符号和传输链路的信道状态信息(CSI)来预测接收器上的传输符号。因此,所提出的系统消除了在无线通信系统中使用复杂信道编码/解码块的需要。具体来说,我们探讨了所提出的DEC系统在空间调制ofdm (SM-OFDM)系统中的应用。SM是一种在接收端避免信道间干扰(ICI)的技术,同时也提供了能量和频谱效率之间的良好平衡。这与DEC系统一起可以被证明是下一代无线系统的兴趣,特别是对于在中等数据速率下需要最佳误码率(BER)的物联网(IoT)设备。将该系统的性能与Trellis编码- sm (TCSM)系统进行了比较。仿真结果成功地验证了decc辅助的SM-OFDM系统在误码率和吞吐量方面优于TCSM系统。
{"title":"DEC-aided SM-OFDM: A Spatial Modulation System with Deep Learning based Error Correction","authors":"H. Verma, V. Bohara, Anubha Gupta","doi":"10.1145/3564121.3564131","DOIUrl":"https://doi.org/10.1145/3564121.3564131","url":null,"abstract":"In this work, we propose a Deep Learning (DL) based error correction system termed as DEC. It predicts the transmitted symbols at the receiver using the received soft symbols and channel state information (CSI) of the transmission link. Hence, the proposed system eliminates the need of using complex channel coding/decoding blocks in the wireless communication system. Specifically, we explore the application of proposed DEC system for Spatial Modulation-OFDM (SM-OFDM) systems. SM is a technique that avoids inter-channel interference (ICI) at receiver input, also offers a good balance between the energy and spectral efficiency. This together with DEC system can prove to be of interest for the next generation wireless system, particularly for the Internet-of-Things (IoT) devices that require optimal bit-error ratios (BER) at moderate data rates. The performance of the proposed system is compared with Trellis coded-SM (TCSM) system. The obtained simulation results successfully verify the superiority of the DEC-aided SM-OFDM system over the TCSM in terms of both BER and throughput.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126121536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep Neural Networks (DNN) have made machine learning accessible to a wide set of practitioners working with field deployment of analytics algorithms over sensor data. Along with it, focus on data privacy, low latency inference, and sustainability has highlighted the need for efficient in-situ analytics close to sensors, at the edge of the network, which is challenging given the constrained nature of the edge platforms, including Common Off-the-Shelf (COTS) AI accelerators. Efficient DNN model partitioning across multiple edge nodes is a well-studied approach, but no definitive characterization exists as to why there is a performance improvement due to DNN model partitioning, and whether the benefits hold for currently used edge hardware & state-of-the-art DNN models. In this paper, we present a detailed study and analyses to address the above-mentioned shortcomings and propose a framework that automatically determines the best partitioning scheme and enhances system efficiency.
{"title":"Automated Deep Learning Model Partitioning for Heterogeneous Edge Devices","authors":"Arijit Mukherjee, Swarnava Dey","doi":"10.1145/3564121.3564796","DOIUrl":"https://doi.org/10.1145/3564121.3564796","url":null,"abstract":"Deep Neural Networks (DNN) have made machine learning accessible to a wide set of practitioners working with field deployment of analytics algorithms over sensor data. Along with it, focus on data privacy, low latency inference, and sustainability has highlighted the need for efficient in-situ analytics close to sensors, at the edge of the network, which is challenging given the constrained nature of the edge platforms, including Common Off-the-Shelf (COTS) AI accelerators. Efficient DNN model partitioning across multiple edge nodes is a well-studied approach, but no definitive characterization exists as to why there is a performance improvement due to DNN model partitioning, and whether the benefits hold for currently used edge hardware & state-of-the-art DNN models. In this paper, we present a detailed study and analyses to address the above-mentioned shortcomings and propose a framework that automatically determines the best partitioning scheme and enhances system efficiency.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128266808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinmay Mahajan, Ashwin Krishnan, M. Nambiar, Rekha Singhal
We see two trends emerging due to exponential increase in AI research- rise in adoption of AI based models in enterprise applications and development of different types of hardware accelerators with varying memory and computing architectures for accelerating AI workloads. Accelerators may have different types of memories, varying on access latency and storage capacity. A recommendation model’s inference latency is highly influenced by the time to fetch embeddings from the embedding tables. In this paper, we present Hetero-Rec, a framework for optimal deployment of embeddings for faster inference of recommendation model. The main idea is to cache frequently accessed embeddings on faster memories to reduce average latency during inference. Hetero-Rec uses performance model-based optimization algorithm and use of spline based learned index for determining the optimal reservation of portions of embedding tables across different memory types available for deployment, based on their past access patterns. We validate our approach for heterogeneous memory architectures, such as URAM (Ultra-Random Access Memory), BRAM (Block Random Access Memory), HBM (High-Bandwidth Memory) and DDR (Double Data Rate) on a server platform with an FPGA accelerator. We observe that the presented optimization algorithm for dynamic placement of embedding tables yields a reduction on average latency of up to 1.52x, 1.68x, and 2.91x for the weekly, daily, and hourly access patterns, respectively in the transaction history as compared to the state-of-the-art systems.
{"title":"Hetero-Rec: Optimal Deployment of Embeddings for High-Speed Recommendations","authors":"Chinmay Mahajan, Ashwin Krishnan, M. Nambiar, Rekha Singhal","doi":"10.1145/3564121.3564134","DOIUrl":"https://doi.org/10.1145/3564121.3564134","url":null,"abstract":"We see two trends emerging due to exponential increase in AI research- rise in adoption of AI based models in enterprise applications and development of different types of hardware accelerators with varying memory and computing architectures for accelerating AI workloads. Accelerators may have different types of memories, varying on access latency and storage capacity. A recommendation model’s inference latency is highly influenced by the time to fetch embeddings from the embedding tables. In this paper, we present Hetero-Rec, a framework for optimal deployment of embeddings for faster inference of recommendation model. The main idea is to cache frequently accessed embeddings on faster memories to reduce average latency during inference. Hetero-Rec uses performance model-based optimization algorithm and use of spline based learned index for determining the optimal reservation of portions of embedding tables across different memory types available for deployment, based on their past access patterns. We validate our approach for heterogeneous memory architectures, such as URAM (Ultra-Random Access Memory), BRAM (Block Random Access Memory), HBM (High-Bandwidth Memory) and DDR (Double Data Rate) on a server platform with an FPGA accelerator. We observe that the presented optimization algorithm for dynamic placement of embedding tables yields a reduction on average latency of up to 1.52x, 1.68x, and 2.91x for the weekly, daily, and hourly access patterns, respectively in the transaction history as compared to the state-of-the-art systems.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"66 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133488632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arijit Mukherjee, A. Ukil, Swarnava Dey, Gitesh Kulkarni
Resource-constrained platforms such as micro-controllers are the workhorses in embedded systems, being deployed to capture data from sensors and send the collected data to cloud for processing. Recently, a great interest is seen in the research community and industry to use these devices for performing Artificial Intelligence/Machine Learning (AI/ML) inference tasks in the areas of computer vision, natural language processing, machine monitoring etc. leading to the realization of embedded intelligence at the edge. This task is challenging and needs a significant knowledge of AI/ML applications, algorithms, and computer architecture and their interactions to achieve the desired performance. In this tutorial we cover a few aspects that will help embedded systems designers and AI/ML engineers and scientists to deploy the AI/ML models on the Tiny Edge Devices at an optimum level of performance.
{"title":"TinyML Techniques for running Machine Learning models on Edge Devices","authors":"Arijit Mukherjee, A. Ukil, Swarnava Dey, Gitesh Kulkarni","doi":"10.1145/3564121.3564812","DOIUrl":"https://doi.org/10.1145/3564121.3564812","url":null,"abstract":"Resource-constrained platforms such as micro-controllers are the workhorses in embedded systems, being deployed to capture data from sensors and send the collected data to cloud for processing. Recently, a great interest is seen in the research community and industry to use these devices for performing Artificial Intelligence/Machine Learning (AI/ML) inference tasks in the areas of computer vision, natural language processing, machine monitoring etc. leading to the realization of embedded intelligence at the edge. This task is challenging and needs a significant knowledge of AI/ML applications, algorithms, and computer architecture and their interactions to achieve the desired performance. In this tutorial we cover a few aspects that will help embedded systems designers and AI/ML engineers and scientists to deploy the AI/ML models on the Tiny Edge Devices at an optimum level of performance.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127192726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}