Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns and insights from vast, unlabeled datasets. However, traditional methods like Partitioning Around Medoids (PAM) struggle with scalability due to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing centroid selection, WOA-kMedoids reduces computational complexity of the k-medoids algorithm from quadratic to near-linear with respect to the number of observations. This improvement in efficiency enables WOA-kMedoids to be scalable to large datasets while maintaining high clustering accuracy. We evaluated the performance of WOA-kMedoids on 25 diverse time series datasets from the UCR archive. Our empirical results demonstrate that WOA-kMedoids maintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited slightly higher runtime than PAM on small datasets (less than 300 observations), it outperformed PAM in computational efficiency on larger datasets. The scalability of WOA-kMedoids, combined with its consistently high accuracy, positions it as a promising and practical choice for unsupervised clustering in big data applications. WOA-kMedoids has implications for efficient knowledge discovery in massive, unlabeled datasets across various domains.
{"title":"A Scalable k-Medoids Clustering via Whale Optimization Algorithm","authors":"Huang Chenan, Narumasa Tsutsumida","doi":"arxiv-2408.16993","DOIUrl":"https://doi.org/arxiv-2408.16993","url":null,"abstract":"Unsupervised clustering has emerged as a critical tool for uncovering hidden\u0000patterns and insights from vast, unlabeled datasets. However, traditional\u0000methods like Partitioning Around Medoids (PAM) struggle with scalability due to\u0000their quadratic computational complexity. To address this limitation, we\u0000introduce WOA-kMedoids, a novel unsupervised clustering method that\u0000incorporates the Whale Optimization Algorithm (WOA), a nature-inspired\u0000metaheuristic inspired by the hunting strategies of humpback whales. By\u0000optimizing centroid selection, WOA-kMedoids reduces computational complexity of\u0000the k-medoids algorithm from quadratic to near-linear with respect to the\u0000number of observations. This improvement in efficiency enables WOA-kMedoids to\u0000be scalable to large datasets while maintaining high clustering accuracy. We\u0000evaluated the performance of WOA-kMedoids on 25 diverse time series datasets\u0000from the UCR archive. Our empirical results demonstrate that WOA-kMedoids\u0000maintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited\u0000slightly higher runtime than PAM on small datasets (less than 300\u0000observations), it outperformed PAM in computational efficiency on larger\u0000datasets. The scalability of WOA-kMedoids, combined with its consistently high\u0000accuracy, positions it as a promising and practical choice for unsupervised\u0000clustering in big data applications. WOA-kMedoids has implications for\u0000efficient knowledge discovery in massive, unlabeled datasets across various\u0000domains.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christiaan Boerkamp, Steven van der Vlugt, Zaid Al-Ars
This paper introduces TINA, a novel framework for implementing non Neural Network (NN) signal processing algorithms on NN accelerators such as GPUs, TPUs or FPGAs. The key to this approach is the concept of mapping mathematical and logic functions as a series of convolutional and fully connected layers. By mapping functions into such a small substack of NN layers, it becomes possible to execute non-NN algorithms on NN hardware (HW) accelerators efficiently, as well as to ensure the portability of TINA implementations to any platform that supports such NN accelerators. Results show that TINA is highly competitive compared to alternative frameworks, specifically for complex functions with iterations. For a Polyphase Filter Bank use case TINA shows GPU speedups of up to 80x vs a CPU baseline with NumPy compared to 8x speedup achieved by alternative frameworks. The framework is open source and publicly available at https://github.com/ChristiaanBoe/TINA.
本文介绍了 TINA,这是一种在 GPU、TPUs 或 FPGA 等神经网络加速器上实现非神经网络(NN)信号处理算法的新型框架。这种方法的关键在于将数学和逻辑函数映射为一系列卷积层和全连接层的概念。通过将函数映射到如此小的 NN 层子包中,就有可能在 NN 硬件(HW)加速器上高效执行非 NN 算法,并确保 TINA 实现可移植到任何支持此类 NN 加速器的平台上。结果表明,与其他框架相比,TINA 具有很强的竞争力,特别是在复杂函数迭代方面。在多相滤波器库使用案例中,TINA 的 GPU 速度是使用 NumPy 的 CPU 基线速度的 80 倍,而其他框架的速度仅为 8 倍。该框架是开源的,可在https://github.com/ChristiaanBoe/TINA。
{"title":"TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators","authors":"Christiaan Boerkamp, Steven van der Vlugt, Zaid Al-Ars","doi":"arxiv-2408.16551","DOIUrl":"https://doi.org/arxiv-2408.16551","url":null,"abstract":"This paper introduces TINA, a novel framework for implementing non Neural\u0000Network (NN) signal processing algorithms on NN accelerators such as GPUs, TPUs\u0000or FPGAs. The key to this approach is the concept of mapping mathematical and\u0000logic functions as a series of convolutional and fully connected layers. By\u0000mapping functions into such a small substack of NN layers, it becomes possible\u0000to execute non-NN algorithms on NN hardware (HW) accelerators efficiently, as\u0000well as to ensure the portability of TINA implementations to any platform that\u0000supports such NN accelerators. Results show that TINA is highly competitive\u0000compared to alternative frameworks, specifically for complex functions with\u0000iterations. For a Polyphase Filter Bank use case TINA shows GPU speedups of up\u0000to 80x vs a CPU baseline with NumPy compared to 8x speedup achieved by\u0000alternative frameworks. The framework is open source and publicly available at\u0000https://github.com/ChristiaanBoe/TINA.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ppOpen-AT is a domain-specific language designed to ease the workload for developers creating libraries with auto-tuning (AT) capabilities. It consists of a set of directives that allow for the automatic generation of code necessary for AT by placing annotations in the source program. This approach significantly reduces the effort required by numerical library developers. This technical report details the implementation of the AT software and its extended functions, and provides an explanation of the internal specifications of ppOpen-AT.
ppOpen-AT是一种特定领域语言,旨在减轻开发人员创建具有自动调整(AT)功能库的工作量。它由一组指令组成,通过在源程序中添加注释,自动生成 AT 所需的代码。这种方法大大减少了数值库开发人员的工作量。本技术报告详细介绍了 AT 软件及其扩展功能的实现,并解释了ppOpen-AT 的内部规范。
{"title":"ppOpen-AT: A Directive-base Auto-tuning Language","authors":"Takahiro Katagiri","doi":"arxiv-2408.16607","DOIUrl":"https://doi.org/arxiv-2408.16607","url":null,"abstract":"ppOpen-AT is a domain-specific language designed to ease the workload for\u0000developers creating libraries with auto-tuning (AT) capabilities. It consists\u0000of a set of directives that allow for the automatic generation of code\u0000necessary for AT by placing annotations in the source program. This approach\u0000significantly reduces the effort required by numerical library developers. This\u0000technical report details the implementation of the AT software and its extended\u0000functions, and provides an explanation of the internal specifications of\u0000ppOpen-AT.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.
{"title":"Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures","authors":"Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette","doi":"arxiv-2408.16036","DOIUrl":"https://doi.org/arxiv-2408.16036","url":null,"abstract":"The proliferation of interconnected devices in the Internet of Things (IoT)\u0000has led to an exponential increase in data, commonly known as Big IoT Data.\u0000Efficient retrieval of this heterogeneous data demands a robust indexing\u0000mechanism for effective organization. However, a significant challenge remains:\u0000the overlap in data space partitions during index construction. This overlap\u0000increases node access during search and retrieval, resulting in higher resource\u0000consumption, performance bottlenecks, and impedes system scalability. To\u0000address this issue, we propose three innovative heuristics designed to quantify\u0000and strategically reduce data space partition overlap. The volume-based method\u0000(VBM) offers a detailed assessment by calculating the intersection volume\u0000between partitions, providing deeper insights into spatial relationships. The\u0000distance-based method (DBM) enhances efficiency by using the distance between\u0000partition centers and radii to evaluate overlap, offering a streamlined yet\u0000accurate approach. Finally, the object-based method (OBM) provides a practical\u0000solution by counting objects across multiple partitions, delivering an\u0000intuitive understanding of data space dynamics. Experimental results\u0000demonstrate the effectiveness of these methods in reducing search time,\u0000underscoring their potential to improve data space partitioning and enhance\u0000overall system performance.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emergence of deep learning models has revolutionized various industries over the last decade, leading to a surge in connected devices and infrastructures. However, these models can be tricked into making incorrect predictions with high confidence, leading to disastrous failures and security concerns. To this end, we explore the impact of adversarial attacks on multivariate time-series forecasting and investigate methods to counter them. Specifically, we employ untargeted white-box attacks, namely the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputs to the training process, effectively misleading the model. We also illustrate the subtle modifications to the inputs after the attack, which makes detecting the attack using the naked eye quite difficult. Having demonstrated the feasibility of these attacks, we develop robust models through adversarial training and model hardening. We are among the first to showcase the transferability of these attacks and defenses by extrapolating our work from the benchmark electricity data to a larger, 10-year real-world data used for predicting the time-to-failure of hard disks. Our experimental results confirm that the attacks and defenses achieve the desired security thresholds, leading to a 72.41% and 94.81% decrease in RMSE for the electricity and hard disk datasets respectively after implementing the adversarial defenses.
{"title":"Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures","authors":"Pooja Krishan, Rohan Mohapatra, Saptarshi Sengupta","doi":"arxiv-2408.14875","DOIUrl":"https://doi.org/arxiv-2408.14875","url":null,"abstract":"The emergence of deep learning models has revolutionized various industries\u0000over the last decade, leading to a surge in connected devices and\u0000infrastructures. However, these models can be tricked into making incorrect\u0000predictions with high confidence, leading to disastrous failures and security\u0000concerns. To this end, we explore the impact of adversarial attacks on\u0000multivariate time-series forecasting and investigate methods to counter them.\u0000Specifically, we employ untargeted white-box attacks, namely the Fast Gradient\u0000Sign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputs\u0000to the training process, effectively misleading the model. We also illustrate\u0000the subtle modifications to the inputs after the attack, which makes detecting\u0000the attack using the naked eye quite difficult. Having demonstrated the\u0000feasibility of these attacks, we develop robust models through adversarial\u0000training and model hardening. We are among the first to showcase the\u0000transferability of these attacks and defenses by extrapolating our work from\u0000the benchmark electricity data to a larger, 10-year real-world data used for\u0000predicting the time-to-failure of hard disks. Our experimental results confirm\u0000that the attacks and defenses achieve the desired security thresholds, leading\u0000to a 72.41% and 94.81% decrease in RMSE for the electricity and hard disk\u0000datasets respectively after implementing the adversarial defenses.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software optimization.
{"title":"Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects","authors":"Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler","doi":"arxiv-2408.14090","DOIUrl":"https://doi.org/arxiv-2408.14090","url":null,"abstract":"Multi-GPU nodes are increasingly common in the rapidly evolving landscape of\u0000exascale supercomputers. On these systems, GPUs on the same node are connected\u0000through dedicated networks, with bandwidths up to a few terabits per second.\u0000However, gauging performance expectations and maximizing system efficiency is\u0000challenging due to different technologies, design options, and software layers.\u0000This paper comprehensively characterizes three supercomputers - Alps, Leonardo,\u0000and LUMI - each with a unique architecture and design. We focus on performance\u0000evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using\u0000a mix of intra-node and inter-node benchmarks. By analyzing its limitations and\u0000opportunities, we aim to offer practical guidance to researchers, system\u0000architects, and software developers dealing with multi-GPU supercomputing. Our\u0000results show that there is untapped bandwidth, and there are still many\u0000opportunities for optimization, ranging from network to software optimization.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Rajabinasab, Anton D. Lautrup, Tobias Hyrup, Arthur Zimek
Expressive evaluation metrics are indispensable for informative experiments in all areas, and while several metrics are established in some areas, in others, such as feature selection, only indirect or otherwise limited evaluation metrics are found. In this paper, we propose a novel evaluation metric to address several problems of its predecessors and allow for flexible and reliable evaluation of feature selection algorithms. The proposed metric is a dynamic metric with two properties that can be used to evaluate both the performance and the stability of a feature selection algorithm. We conduct several empirical experiments to illustrate the use of the proposed metric in the successful evaluation of feature selection algorithms. We also provide a comparison and analysis to show the different aspects involved in the evaluation of the feature selection algorithms. The results indicate that the proposed metric is successful in carrying out the evaluation task for feature selection algorithms. This paper is an extended version of a paper accepted at SISAP 2024.
{"title":"FSDEM: Feature Selection Dynamic Evaluation Metric","authors":"Muhammad Rajabinasab, Anton D. Lautrup, Tobias Hyrup, Arthur Zimek","doi":"arxiv-2408.14234","DOIUrl":"https://doi.org/arxiv-2408.14234","url":null,"abstract":"Expressive evaluation metrics are indispensable for informative experiments\u0000in all areas, and while several metrics are established in some areas, in\u0000others, such as feature selection, only indirect or otherwise limited\u0000evaluation metrics are found. In this paper, we propose a novel evaluation\u0000metric to address several problems of its predecessors and allow for flexible\u0000and reliable evaluation of feature selection algorithms. The proposed metric is\u0000a dynamic metric with two properties that can be used to evaluate both the\u0000performance and the stability of a feature selection algorithm. We conduct\u0000several empirical experiments to illustrate the use of the proposed metric in\u0000the successful evaluation of feature selection algorithms. We also provide a\u0000comparison and analysis to show the different aspects involved in the\u0000evaluation of the feature selection algorithms. The results indicate that the\u0000proposed metric is successful in carrying out the evaluation task for feature\u0000selection algorithms. This paper is an extended version of a paper accepted at SISAP 2024.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez Luna, Raghavendra Kanakagiri, Rui Min, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O Mahony
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.
{"title":"Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments","authors":"Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez Luna, Raghavendra Kanakagiri, Rui Min, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O Mahony","doi":"arxiv-2408.12173","DOIUrl":"https://doi.org/arxiv-2408.12173","url":null,"abstract":"Knowledge graphs (KGs) have achieved significant attention in recent years,\u0000particularly in the area of the Semantic Web as well as gaining popularity in\u0000other application domains such as data mining and search engines.\u0000Simultaneously, there has been enormous progress in the development of\u0000different types of heterogeneous hardware, impacting the way KGs are processed.\u0000The aim of this paper is to provide a systematic literature review of knowledge\u0000graph hardware acceleration. For this, we present a classification of the\u0000primary areas in knowledge graph technology that harnesses different hardware\u0000units for accelerating certain knowledge graph functionalities. We then\u0000extensively describe respective works, focusing on how KG related schemes\u0000harness modern hardware accelerators. Based on our review, we identify various\u0000research gaps and future exploratory directions that are anticipated to be of\u0000significant value both for academics and industry practitioners.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nishan Gunawardena, Gough Yumu Lui, Jeewani Anupama Ginige, Bahman Javadi
A significant limitation of current smartphone-based eye-tracking algorithms is their low accuracy when applied to video-type visual stimuli, as they are typically trained on static images. Also, the increasing demand for real-time interactive applications like games, VR, and AR on smartphones requires overcoming the limitations posed by resource constraints such as limited computational power, battery life, and network bandwidth. Therefore, we developed two new smartphone eye-tracking techniques for video-type visuals by combining Convolutional Neural Networks (CNN) with two different Recurrent Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean Square Error of 0.955cm and 1.091cm, respectively. To address the computational constraints of smartphones, we developed an edge intelligence architecture to enhance the performance of smartphone-based eye tracking. We applied various optimisation methods like quantisation and pruning to deep learning models for better energy, CPU, and memory usage on edge devices, focusing on real-time processing. Using model quantisation, the model inference time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge devices.
{"title":"Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation","authors":"Nishan Gunawardena, Gough Yumu Lui, Jeewani Anupama Ginige, Bahman Javadi","doi":"arxiv-2408.12463","DOIUrl":"https://doi.org/arxiv-2408.12463","url":null,"abstract":"A significant limitation of current smartphone-based eye-tracking algorithms\u0000is their low accuracy when applied to video-type visual stimuli, as they are\u0000typically trained on static images. Also, the increasing demand for real-time\u0000interactive applications like games, VR, and AR on smartphones requires\u0000overcoming the limitations posed by resource constraints such as limited\u0000computational power, battery life, and network bandwidth. Therefore, we\u0000developed two new smartphone eye-tracking techniques for video-type visuals by\u0000combining Convolutional Neural Networks (CNN) with two different Recurrent\u0000Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent\u0000Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean\u0000Square Error of 0.955cm and 1.091cm, respectively. To address the computational\u0000constraints of smartphones, we developed an edge intelligence architecture to\u0000enhance the performance of smartphone-based eye tracking. We applied various\u0000optimisation methods like quantisation and pruning to deep learning models for\u0000better energy, CPU, and memory usage on edge devices, focusing on real-time\u0000processing. Using model quantisation, the model inference time in the CNN+LSTM\u0000and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge\u0000devices.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a run-time auto-tuning method for performance parameters according to input matrices is proposed. RAO-SS (Run-time Auto-tuning Optimizer for Sparse Solvers), which is a prototype of auto-tuning software using the proposed method, is also evaluated. The RAO-SS is implemented with the Autopilot, which is middle-ware to support run-time auto-tuning with fuzzy logic function. The target numerical library is the SuperLU, which is a sparse direct solver for linear equations. The result indicated that: (1) the speedup factors of 1.2 for average and 3.6 for maximum to default executions were obtained; (2) the software overhead of the Autopilot can be ignored in RAO-SS.
{"title":"RAO-SS: A Prototype of Run-time Auto-tuning Facility for Sparse Direct Solvers","authors":"Takahiro Katagiri, Yoshinori Ishii, Hiroki Honda","doi":"arxiv-2408.11880","DOIUrl":"https://doi.org/arxiv-2408.11880","url":null,"abstract":"In this paper, a run-time auto-tuning method for performance parameters\u0000according to input matrices is proposed. RAO-SS (Run-time Auto-tuning Optimizer\u0000for Sparse Solvers), which is a prototype of auto-tuning software using the\u0000proposed method, is also evaluated. The RAO-SS is implemented with the\u0000Autopilot, which is middle-ware to support run-time auto-tuning with fuzzy\u0000logic function. The target numerical library is the SuperLU, which is a sparse\u0000direct solver for linear equations. The result indicated that: (1) the speedup\u0000factors of 1.2 for average and 3.6 for maximum to default executions were\u0000obtained; (2) the software overhead of the Autopilot can be ignored in RAO-SS.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}