Real-world intelligent services employing deep learning technology typically take a two-tier system architecture -- a dumb front-end device and smart back-end cloud servers. The front-end device simply forwards a human query while the back-end servers run a complex deep model to resolve the query and respond to the front-end device. While simple and effective, the current architecture not only increases the load at servers but also runs the risk of harming user privacy. In this paper, we present knowledge caching, which exploits the front-end device as a smart cache of a generalized deep model. The cache locally resolves a subset of popular or privacy-sensitive queries while it forwards the rest of them to back-end cloud servers. We discuss the feasibility of knowledge caching as well as technical challenges around deep model specialization and compression. We show our prototype two-stage inference system that populates a front-end cache with 10 voice commands out of 35 commands. We demonstrate that our specialization and compression techniques reduce the cached model size by 17.4x from the original model with 1.8x improvement on the inference accuracy.
{"title":"A Case for Two-stage Inference with Knowledge Caching","authors":"Geonha Park, Changho Hwang, KyoungSoo Park","doi":"10.1145/3325413.3329789","DOIUrl":"https://doi.org/10.1145/3325413.3329789","url":null,"abstract":"Real-world intelligent services employing deep learning technology typically take a two-tier system architecture -- a dumb front-end device and smart back-end cloud servers. The front-end device simply forwards a human query while the back-end servers run a complex deep model to resolve the query and respond to the front-end device. While simple and effective, the current architecture not only increases the load at servers but also runs the risk of harming user privacy. In this paper, we present knowledge caching, which exploits the front-end device as a smart cache of a generalized deep model. The cache locally resolves a subset of popular or privacy-sensitive queries while it forwards the rest of them to back-end cloud servers. We discuss the feasibility of knowledge caching as well as technical challenges around deep model specialization and compression. We show our prototype two-stage inference system that populates a front-end cache with 10 voice commands out of 35 commands. We demonstrate that our specialization and compression techniques reduce the cached model size by 17.4x from the original model with 1.8x improvement on the inference accuracy.","PeriodicalId":164793,"journal":{"name":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126835800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of ICT technology, services using the Internet of Things (IoT) have been implemented in various fields. Among them, location-based services using beacons have the advantage that they can be used semi-permanently using Bluetooth Low Energy (BLE). In this paper, we utilize these advantages to infer indoor localization of beacon. Install multiple beacon transceivers on one floor of the building and learn the location of the beacon transmitter using neural network learning. As a result, neural network learning showed high indoor localization accuracy.
{"title":"Bluetooth Beacon-Based Indoor Localization Using Self-Learning Neural Network","authors":"Kisu Ok, Dongwoo Kwon, Youngmin Ji","doi":"10.1145/3325413.3329792","DOIUrl":"https://doi.org/10.1145/3325413.3329792","url":null,"abstract":"With the development of ICT technology, services using the Internet of Things (IoT) have been implemented in various fields. Among them, location-based services using beacons have the advantage that they can be used semi-permanently using Bluetooth Low Energy (BLE). In this paper, we utilize these advantages to infer indoor localization of beacon. Install multiple beacon transceivers on one floor of the building and learn the location of the beacon transmitter using neural network learning. As a result, neural network learning showed high indoor localization accuracy.","PeriodicalId":164793,"journal":{"name":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130339275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning (DL) computation offloading is commonly adopted to enable the use of computation-intensive DL techniques on resource-constrained devices. However, sending private user data to an external server raises a serious privacy concern. In this paper, we introduce a privacy-invading input reconstruction method which utilizes intermediate data of the DL computation pipeline. In doing so, we first define a Peak Signal-to-Noise Ratio (PSNR)-based metric for assessing input reconstruction quality. Then, we simulate a privacy attack on diverse DL models to find out the relationship between DL model structures and performance of privacy attacks. Finally, we provide several insights on DL model structure design to prevent reconstruction-based privacy attacks: using skip-connection, making model deeper, including various DL operations such as inception module.
{"title":"Exploring Image Reconstruction Attack in Deep Learning Computation Offloading","authors":"Hyunseok Oh, Youngki Lee","doi":"10.1145/3325413.3329791","DOIUrl":"https://doi.org/10.1145/3325413.3329791","url":null,"abstract":"Deep learning (DL) computation offloading is commonly adopted to enable the use of computation-intensive DL techniques on resource-constrained devices. However, sending private user data to an external server raises a serious privacy concern. In this paper, we introduce a privacy-invading input reconstruction method which utilizes intermediate data of the DL computation pipeline. In doing so, we first define a Peak Signal-to-Noise Ratio (PSNR)-based metric for assessing input reconstruction quality. Then, we simulate a privacy attack on diverse DL models to find out the relationship between DL model structures and performance of privacy attacks. Finally, we provide several insights on DL model structure design to prevent reconstruction-based privacy attacks: using skip-connection, making model deeper, including various DL operations such as inception module.","PeriodicalId":164793,"journal":{"name":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131635386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Offloading computations to servers is a promising method for resource constrained devices to run deep neural network (DNN). It often requires pre-installing DNN models at the server, which is not a valid assumption in an edge server environment where a client can offload to any nearby server, especially when it is on the move. So, the client needs to upload the DNN model on demand, but uploading the entire layers at once can seriously delay the offloading of the DNN queries due to its high overhead. IONN is a technique to partition the layers and upload them incrementally for fast start of offloading [1]. It partitions the DNN layers using the shortest path on a DNN execution graph between the client and the server based on a penalty factor for the uploading overhead. This paper proposes a new partition algorithm based on efficiency, which generates a more fine-grained uploading plan. Experimental results show that the proposed algorithm tangibly improves the query performance during uploading by as much as 55%, with faster execution of initially-raised queries.
{"title":"Enhanced Partitioning of DNN Layers for Uploading from Mobile Devices to Edge Servers","authors":"K. Shin, H. Jeong, Soo-Mook Moon","doi":"10.1145/3325413.3329788","DOIUrl":"https://doi.org/10.1145/3325413.3329788","url":null,"abstract":"Offloading computations to servers is a promising method for resource constrained devices to run deep neural network (DNN). It often requires pre-installing DNN models at the server, which is not a valid assumption in an edge server environment where a client can offload to any nearby server, especially when it is on the move. So, the client needs to upload the DNN model on demand, but uploading the entire layers at once can seriously delay the offloading of the DNN queries due to its high overhead. IONN is a technique to partition the layers and upload them incrementally for fast start of offloading [1]. It partitions the DNN layers using the shortest path on a DNN execution graph between the client and the server based on a penalty factor for the uploading overhead. This paper proposes a new partition algorithm based on efficiency, which generates a more fine-grained uploading plan. Experimental results show that the proposed algorithm tangibly improves the query performance during uploading by as much as 55%, with faster execution of initially-raised queries.","PeriodicalId":164793,"journal":{"name":"The 3rd International Workshop on Deep Learning for Mobile Systems and Applications - EMDL '19","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130534984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}