arXiv - CS - Performance最新文献

英文中文

A Scalable k-Medoids Clustering via Whale Optimization Algorithm 通过鲸鱼优化算法实现可扩展的 k-Medoids 集群

arXiv - CS - Performance

Pub Date : 2024-08-30 DOI: arxiv-2408.16993

Huang Chenan, Narumasa Tsutsumida

Unsupervised clustering has emerged as a critical tool for uncovering hiddenpatterns and insights from vast, unlabeled datasets. However, traditionalmethods like Partitioning Around Medoids (PAM) struggle with scalability due totheir quadratic computational complexity. To address this limitation, weintroduce WOA-kMedoids, a novel unsupervised clustering method thatincorporates the Whale Optimization Algorithm (WOA), a nature-inspiredmetaheuristic inspired by the hunting strategies of humpback whales. Byoptimizing centroid selection, WOA-kMedoids reduces computational complexity ofthe k-medoids algorithm from quadratic to near-linear with respect to thenumber of observations. This improvement in efficiency enables WOA-kMedoids tobe scalable to large datasets while maintaining high clustering accuracy. Weevaluated the performance of WOA-kMedoids on 25 diverse time series datasetsfrom the UCR archive. Our empirical results demonstrate that WOA-kMedoidsmaintains clustering accuracy similar to PAM. While WOA-kMedoids exhibitedslightly higher runtime than PAM on small datasets (less than 300observations), it outperformed PAM in computational efficiency on largerdatasets. The scalability of WOA-kMedoids, combined with its consistently highaccuracy, positions it as a promising and practical choice for unsupervisedclustering in big data applications. WOA-kMedoids has implications forefficient knowledge discovery in massive, unlabeled datasets across variousdomains.

无监督聚类已成为从庞大的无标记数据集中发掘隐藏模式和洞察力的重要工具。然而，传统的方法（如环中网格划分法（PAM））由于其二次计算复杂性而难以扩展。为了解决这一局限性，我们引入了 WOA-kMedoids，这是一种新型的无监督聚类方法，它结合了鲸鱼优化算法（WOA），这是一种受座头鲸狩猎策略启发的自然启发元启发式算法。通过优化中心点选择，WOA-kMedoids 将 k-medoids 算法的计算复杂度从与观测值数量相关的二次方降低到接近线性。效率的提高使 WOA-kMedoids 可以扩展到大型数据集，同时保持较高的聚类精度。我们在来自 UCR 档案库的 25 个不同时间序列数据集上评估了 WOA-kMedoids 的性能。实证结果表明，WOA-kMedoids 保持了与 PAM 相似的聚类精度。虽然 WOA-kMedoids 在小型数据集（少于 300 个观测值）上的运行时间略高于 PAM，但在大型数据集上的计算效率却优于 PAM。WOA-kMedoids 的可扩展性加上其一贯的高精确度，使它成为大数据应用中无监督聚类的一个有前途的实用选择。WOA-kMedoids 对在不同领域的海量无标记数据集中进行高效知识发现具有重要意义。

{"title":"A Scalable k-Medoids Clustering via Whale Optimization Algorithm","authors":"Huang Chenan, Narumasa Tsutsumida","doi":"arxiv-2408.16993","DOIUrl":"https://doi.org/arxiv-2408.16993","url":null,"abstract":"Unsupervised clustering has emerged as a critical tool for uncovering hidden\u0000patterns and insights from vast, unlabeled datasets. However, traditional\u0000methods like Partitioning Around Medoids (PAM) struggle with scalability due to\u0000their quadratic computational complexity. To address this limitation, we\u0000introduce WOA-kMedoids, a novel unsupervised clustering method that\u0000incorporates the Whale Optimization Algorithm (WOA), a nature-inspired\u0000metaheuristic inspired by the hunting strategies of humpback whales. By\u0000optimizing centroid selection, WOA-kMedoids reduces computational complexity of\u0000the k-medoids algorithm from quadratic to near-linear with respect to the\u0000number of observations. This improvement in efficiency enables WOA-kMedoids to\u0000be scalable to large datasets while maintaining high clustering accuracy. We\u0000evaluated the performance of WOA-kMedoids on 25 diverse time series datasets\u0000from the UCR archive. Our empirical results demonstrate that WOA-kMedoids\u0000maintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited\u0000slightly higher runtime than PAM on small datasets (less than 300\u0000observations), it outperformed PAM in computational efficiency on larger\u0000datasets. The scalability of WOA-kMedoids, combined with its consistently high\u0000accuracy, positions it as a promising and practical choice for unsupervised\u0000clustering in big data applications. WOA-kMedoids has implications for\u0000efficient knowledge discovery in massive, unlabeled datasets across various\u0000domains.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators TINA：使用网络加速器加速非网络信号处理算法

arXiv - CS - Performance

Pub Date : 2024-08-29 DOI: arxiv-2408.16551

Christiaan Boerkamp, Steven van der Vlugt, Zaid Al-Ars

This paper introduces TINA, a novel framework for implementing non NeuralNetwork (NN) signal processing algorithms on NN accelerators such as GPUs, TPUsor FPGAs. The key to this approach is the concept of mapping mathematical andlogic functions as a series of convolutional and fully connected layers. Bymapping functions into such a small substack of NN layers, it becomes possibleto execute non-NN algorithms on NN hardware (HW) accelerators efficiently, aswell as to ensure the portability of TINA implementations to any platform thatsupports such NN accelerators. Results show that TINA is highly competitivecompared to alternative frameworks, specifically for complex functions withiterations. For a Polyphase Filter Bank use case TINA shows GPU speedups of upto 80x vs a CPU baseline with NumPy compared to 8x speedup achieved byalternative frameworks. The framework is open source and publicly available athttps://github.com/ChristiaanBoe/TINA.

本文介绍了 TINA，这是一种在 GPU、TPUs 或 FPGA 等神经网络加速器上实现非神经网络（NN）信号处理算法的新型框架。这种方法的关键在于将数学和逻辑函数映射为一系列卷积层和全连接层的概念。通过将函数映射到如此小的 NN 层子包中，就有可能在 NN 硬件（HW）加速器上高效执行非 NN 算法，并确保 TINA 实现可移植到任何支持此类 NN 加速器的平台上。结果表明，与其他框架相比，TINA 具有很强的竞争力，特别是在复杂函数迭代方面。在多相滤波器库使用案例中，TINA 的 GPU 速度是使用 NumPy 的 CPU 基线速度的 80 倍，而其他框架的速度仅为 8 倍。该框架是开源的，可在https://github.com/ChristiaanBoe/TINA。

引用次数: 0

ppOpen-AT: A Directive-base Auto-tuning Language ppOpen-AT：基于指令的自动调整语言

arXiv - CS - Performance

Pub Date : 2024-08-29 DOI: arxiv-2408.16607

Takahiro Katagiri

ppOpen-AT is a domain-specific language designed to ease the workload fordevelopers creating libraries with auto-tuning (AT) capabilities. It consistsof a set of directives that allow for the automatic generation of codenecessary for AT by placing annotations in the source program. This approachsignificantly reduces the effort required by numerical library developers. Thistechnical report details the implementation of the AT software and its extendedfunctions, and provides an explanation of the internal specifications ofppOpen-AT.

ppOpen-AT是一种特定领域语言，旨在减轻开发人员创建具有自动调整（AT）功能库的工作量。它由一组指令组成，通过在源程序中添加注释，自动生成 AT 所需的代码。这种方法大大减少了数值库开发人员的工作量。本技术报告详细介绍了 AT 软件及其扩展功能的实现，并解释了ppOpen-AT 的内部规范。

引用次数: 0

Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures 物联网数据中的高效 $k$-NN 搜索：基于树的索引结构中的重叠优化

arXiv - CS - Performance

Pub Date : 2024-08-28 DOI: arxiv-2408.16036

Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette

The proliferation of interconnected devices in the Internet of Things (IoT)has led to an exponential increase in data, commonly known as Big IoT Data.Efficient retrieval of this heterogeneous data demands a robust indexingmechanism for effective organization. However, a significant challenge remains:the overlap in data space partitions during index construction. This overlapincreases node access during search and retrieval, resulting in higher resourceconsumption, performance bottlenecks, and impedes system scalability. Toaddress this issue, we propose three innovative heuristics designed to quantifyand strategically reduce data space partition overlap. The volume-based method(VBM) offers a detailed assessment by calculating the intersection volumebetween partitions, providing deeper insights into spatial relationships. Thedistance-based method (DBM) enhances efficiency by using the distance betweenpartition centers and radii to evaluate overlap, offering a streamlined yetaccurate approach. Finally, the object-based method (OBM) provides a practicalsolution by counting objects across multiple partitions, delivering anintuitive understanding of data space dynamics. Experimental resultsdemonstrate the effectiveness of these methods in reducing search time,underscoring their potential to improve data space partitioning and enhanceoverall system performance.

物联网（IoT）中互联设备的激增导致数据呈指数级增长，这些数据通常被称为物联网大数据。然而，索引构建过程中数据空间分区的重叠仍然是一个重大挑战。这种重叠增加了搜索和检索过程中的节点访问量，导致更高的资源消耗和性能瓶颈，并阻碍了系统的可扩展性。为了解决这个问题，我们提出了三种创新的启发式方法，旨在量化和策略性地减少数据空间分区重叠。基于体积的方法（VBM）通过计算分区之间的交叉体积进行详细评估，从而更深入地了解空间关系。基于距离的方法（DBM）通过使用分区中心和半径之间的距离来评估重叠情况，提供了一种精简而精确的方法，从而提高了效率。最后，基于对象的方法（OBM）通过计算多个分区中的对象，提供了一种实用的解决方案，让人们直观地了解数据空间的动态。实验结果证明了这些方法在减少搜索时间方面的有效性，突出了它们在改进数据空间分区和提高系统整体性能方面的潜力。

{"title":"Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures","authors":"Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette","doi":"arxiv-2408.16036","DOIUrl":"https://doi.org/arxiv-2408.16036","url":null,"abstract":"The proliferation of interconnected devices in the Internet of Things (IoT)\u0000has led to an exponential increase in data, commonly known as Big IoT Data.\u0000Efficient retrieval of this heterogeneous data demands a robust indexing\u0000mechanism for effective organization. However, a significant challenge remains:\u0000the overlap in data space partitions during index construction. This overlap\u0000increases node access during search and retrieval, resulting in higher resource\u0000consumption, performance bottlenecks, and impedes system scalability. To\u0000address this issue, we propose three innovative heuristics designed to quantify\u0000and strategically reduce data space partition overlap. The volume-based method\u0000(VBM) offers a detailed assessment by calculating the intersection volume\u0000between partitions, providing deeper insights into spatial relationships. The\u0000distance-based method (DBM) enhances efficiency by using the distance between\u0000partition centers and radii to evaluate overlap, offering a streamlined yet\u0000accurate approach. Finally, the object-based method (OBM) provides a practical\u0000solution by counting objects across multiple partitions, delivering an\u0000intuitive understanding of data space dynamics. Experimental results\u0000demonstrate the effectiveness of these methods in reducing search time,\u0000underscoring their potential to improve data space partitioning and enhance\u0000overall system performance.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures 智能互联基础设施多变量时间序列预测中的对抗性攻击与防御

arXiv - CS - Performance

Pub Date : 2024-08-27 DOI: arxiv-2408.14875

Pooja Krishan, Rohan Mohapatra, Saptarshi Sengupta

The emergence of deep learning models has revolutionized various industriesover the last decade, leading to a surge in connected devices andinfrastructures. However, these models can be tricked into making incorrectpredictions with high confidence, leading to disastrous failures and securityconcerns. To this end, we explore the impact of adversarial attacks onmultivariate time-series forecasting and investigate methods to counter them.Specifically, we employ untargeted white-box attacks, namely the Fast GradientSign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputsto the training process, effectively misleading the model. We also illustratethe subtle modifications to the inputs after the attack, which makes detectingthe attack using the naked eye quite difficult. Having demonstrated thefeasibility of these attacks, we develop robust models through adversarialtraining and model hardening. We are among the first to showcase thetransferability of these attacks and defenses by extrapolating our work fromthe benchmark electricity data to a larger, 10-year real-world data used forpredicting the time-to-failure of hard disks. Our experimental results confirmthat the attacks and defenses achieve the desired security thresholds, leadingto a 72.41% and 94.81% decrease in RMSE for the electricity and hard diskdatasets respectively after implementing the adversarial defenses.

过去十年间，深度学习模型的出现彻底改变了各行各业，导致互联设备和基础设施激增。然而，这些模型可能会被诱骗，以极高的置信度做出错误的预测，从而导致灾难性的失败和安全问题。为此，我们探讨了对抗性攻击对多变量时间序列预测的影响，并研究了应对方法。具体来说，我们采用了非目标白盒攻击，即快速梯度符号法（FGSM）和基本迭代法（BIM），来毒化训练过程的输入，从而有效地误导模型。我们还展示了攻击后对输入的微妙修改，这使得用肉眼检测攻击变得相当困难。在证明了这些攻击的可行性之后，我们通过对抗训练和模型加固开发出了稳健的模型。通过将我们的工作从基准电力数据外推到用于预测硬盘故障时间的更大的 10 年真实世界数据，我们成为展示这些攻击和防御的可移植性的首批研究者之一。我们的实验结果证实，攻击和防御达到了预期的安全阈值，在实施对抗防御后，电力和硬盘数据集的 RMSE 分别降低了 72.41% 和 94.81%。

{"title":"Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures","authors":"Pooja Krishan, Rohan Mohapatra, Saptarshi Sengupta","doi":"arxiv-2408.14875","DOIUrl":"https://doi.org/arxiv-2408.14875","url":null,"abstract":"The emergence of deep learning models has revolutionized various industries\u0000over the last decade, leading to a surge in connected devices and\u0000infrastructures. However, these models can be tricked into making incorrect\u0000predictions with high confidence, leading to disastrous failures and security\u0000concerns. To this end, we explore the impact of adversarial attacks on\u0000multivariate time-series forecasting and investigate methods to counter them.\u0000Specifically, we employ untargeted white-box attacks, namely the Fast Gradient\u0000Sign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputs\u0000to the training process, effectively misleading the model. We also illustrate\u0000the subtle modifications to the inputs after the attack, which makes detecting\u0000the attack using the naked eye quite difficult. Having demonstrated the\u0000feasibility of these attacks, we develop robust models through adversarial\u0000training and model hardening. We are among the first to showcase the\u0000transferability of these attacks and defenses by extrapolating our work from\u0000the benchmark electricity data to a larger, 10-year real-world data used for\u0000predicting the time-to-failure of hard disks. Our experimental results confirm\u0000that the attacks and defenses achieve the desired security thresholds, leading\u0000to a 72.41% and 94.81% decrease in RMSE for the electricity and hard disk\u0000datasets respectively after implementing the adversarial defenses.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects 探索 GPU 之间的通信：超级计算机互联的启示

arXiv - CS - Performance

Pub Date : 2024-08-26 DOI: arxiv-2408.14090

Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler

Multi-GPU nodes are increasingly common in the rapidly evolving landscape ofexascale supercomputers. On these systems, GPUs on the same node are connectedthrough dedicated networks, with bandwidths up to a few terabits per second.However, gauging performance expectations and maximizing system efficiency ischallenging due to different technologies, design options, and software layers.This paper comprehensively characterizes three supercomputers - Alps, Leonardo,and LUMI - each with a unique architecture and design. We focus on performanceevaluation of intra-node and inter-node interconnects on up to 4096 GPUs, usinga mix of intra-node and inter-node benchmarks. By analyzing its limitations andopportunities, we aim to offer practical guidance to researchers, systemarchitects, and software developers dealing with multi-GPU supercomputing. Ourresults show that there is untapped bandwidth, and there are still manyopportunities for optimization, ranging from network to software optimization.

在快速发展的超大规模超级计算机领域，多 GPU 节点越来越常见。在这些系统中，同一节点上的 GPU 通过专用网络连接，带宽最高可达每秒数太比特。然而，由于技术、设计方案和软件层的不同，衡量性能预期和最大化系统效率是一项挑战。我们重点使用节点内和节点间的混合基准，对多达 4096 个 GPU 的节点内和节点间互连进行性能评估。通过分析其局限性和机遇，我们旨在为研究人员、系统架构师和软件开发人员提供处理多 GPU 超级计算的实用指导。我们的研究结果表明，带宽还有待开发，从网络优化到软件优化，仍有很多优化机会。

{"title":"Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects","authors":"Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler","doi":"arxiv-2408.14090","DOIUrl":"https://doi.org/arxiv-2408.14090","url":null,"abstract":"Multi-GPU nodes are increasingly common in the rapidly evolving landscape of\u0000exascale supercomputers. On these systems, GPUs on the same node are connected\u0000through dedicated networks, with bandwidths up to a few terabits per second.\u0000However, gauging performance expectations and maximizing system efficiency is\u0000challenging due to different technologies, design options, and software layers.\u0000This paper comprehensively characterizes three supercomputers - Alps, Leonardo,\u0000and LUMI - each with a unique architecture and design. We focus on performance\u0000evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using\u0000a mix of intra-node and inter-node benchmarks. By analyzing its limitations and\u0000opportunities, we aim to offer practical guidance to researchers, system\u0000architects, and software developers dealing with multi-GPU supercomputing. Our\u0000results show that there is untapped bandwidth, and there are still many\u0000opportunities for optimization, ranging from network to software optimization.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FSDEM: Feature Selection Dynamic Evaluation Metric FSDEM：特征选择动态评估指标

arXiv - CS - Performance

Pub Date : 2024-08-26 DOI: arxiv-2408.14234

Muhammad Rajabinasab, Anton D. Lautrup, Tobias Hyrup, Arthur Zimek

Expressive evaluation metrics are indispensable for informative experimentsin all areas, and while several metrics are established in some areas, inothers, such as feature selection, only indirect or otherwise limitedevaluation metrics are found. In this paper, we propose a novel evaluationmetric to address several problems of its predecessors and allow for flexibleand reliable evaluation of feature selection algorithms. The proposed metric isa dynamic metric with two properties that can be used to evaluate both theperformance and the stability of a feature selection algorithm. We conductseveral empirical experiments to illustrate the use of the proposed metric inthe successful evaluation of feature selection algorithms. We also provide acomparison and analysis to show the different aspects involved in theevaluation of the feature selection algorithms. The results indicate that theproposed metric is successful in carrying out the evaluation task for featureselection algorithms. This paper is an extended version of a paper accepted at SISAP 2024.

对于所有领域的翔实实验来说，富有表现力的评价指标都是不可或缺的。虽然在某些领域已经建立了多个指标，但在其他领域，例如特征选择，却只能找到间接的或有限的评价指标。在本文中，我们提出了一种新的评估指标，以解决前人的几个问题，并对特征选择算法进行灵活可靠的评估。所提出的指标是一种动态指标，具有两个特性，可用于评估特征选择算法的性能和稳定性。我们进行了多次实证实验，以说明所提指标在成功评估特征选择算法中的应用。我们还进行了比较和分析，以说明特征选择算法评估所涉及的不同方面。结果表明，所提出的度量标准能够成功地完成特征选择算法的评估任务。本文是已被 SISAP 2024 接收的论文的扩展版。

引用次数: 0

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments 知识图谱处理的硬件加速：挑战与最新发展

arXiv - CS - Performance

Pub Date : 2024-08-22 DOI: arxiv-2408.12173

Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez Luna, Raghavendra Kanakagiri, Rui Min, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O Mahony

Knowledge graphs (KGs) have achieved significant attention in recent years,particularly in the area of the Semantic Web as well as gaining popularity inother application domains such as data mining and search engines.Simultaneously, there has been enormous progress in the development ofdifferent types of heterogeneous hardware, impacting the way KGs are processed.The aim of this paper is to provide a systematic literature review of knowledgegraph hardware acceleration. For this, we present a classification of theprimary areas in knowledge graph technology that harnesses different hardwareunits for accelerating certain knowledge graph functionalities. We thenextensively describe respective works, focusing on how KG related schemesharness modern hardware accelerators. Based on our review, we identify variousresearch gaps and future exploratory directions that are anticipated to be ofsignificant value both for academics and industry practitioners.

近年来，知识图谱（KG）备受关注，尤其是在语义网（Semantic Web）领域，并在数据挖掘和搜索引擎等其他应用领域越来越受欢迎。与此同时，不同类型异构硬件的开发也取得了巨大进展，对知识图谱的处理方式产生了影响。为此，我们对知识图谱技术的主要领域进行了分类，这些领域利用不同的硬件设备来加速某些知识图谱功能。然后，我们广泛介绍了相关的工作，重点关注与知识图谱相关的方案如何利用现代硬件加速器。在综述的基础上，我们确定了各种研究空白和未来探索方向，预计这些研究空白和方向将对学术界和业界从业人员具有重要价值。

引用次数: 0

Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation 利用边缘智能和模型优化的智能手机眼球跟踪系统

arXiv - CS - Performance

Pub Date : 2024-08-22 DOI: arxiv-2408.12463

Nishan Gunawardena, Gough Yumu Lui, Jeewani Anupama Ginige, Bahman Javadi

A significant limitation of current smartphone-based eye-tracking algorithmsis their low accuracy when applied to video-type visual stimuli, as they aretypically trained on static images. Also, the increasing demand for real-timeinteractive applications like games, VR, and AR on smartphones requiresovercoming the limitations posed by resource constraints such as limitedcomputational power, battery life, and network bandwidth. Therefore, wedeveloped two new smartphone eye-tracking techniques for video-type visuals bycombining Convolutional Neural Networks (CNN) with two different RecurrentNeural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated RecurrentUnit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root MeanSquare Error of 0.955cm and 1.091cm, respectively. To address the computationalconstraints of smartphones, we developed an edge intelligence architecture toenhance the performance of smartphone-based eye tracking. We applied variousoptimisation methods like quantisation and pruning to deep learning models forbetter energy, CPU, and memory usage on edge devices, focusing on real-timeprocessing. Using model quantisation, the model inference time in the CNN+LSTMand CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edgedevices.

目前基于智能手机的眼动跟踪算法的一个显著局限是，当应用于视频类型的视觉刺激时，其准确性较低，因为这些算法通常是在静态图像上进行训练的。此外，智能手机对游戏、VR 和 AR 等实时交互应用的需求日益增长，这就要求克服资源限制带来的局限性，如有限的计算能力、电池寿命和网络带宽。因此，我们将卷积神经网络（CNN）与两种不同的递归神经网络（RNN）（即长短期记忆（LSTM）和门控递归单元（GRU））相结合，开发了两种新的智能手机眼球跟踪技术，用于视频类型的视觉效果。我们的 CNN+LSTM 和 CNN+GRU 模型的平均均方根误差分别为 0.955 厘米和 1.091 厘米。针对智能手机的计算限制，我们开发了一种边缘智能架构，以提高基于智能手机的眼动追踪性能。我们对深度学习模型采用了量化和剪枝等多种优化方法，以降低边缘设备上的能耗、CPU 和内存使用率，重点关注实时处理。通过模型量化，CNN+LSTM 和 CNN+GRU 模型的推理时间在边缘设备上分别缩短了 21.72% 和 19.50%。

{"title":"Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation","authors":"Nishan Gunawardena, Gough Yumu Lui, Jeewani Anupama Ginige, Bahman Javadi","doi":"arxiv-2408.12463","DOIUrl":"https://doi.org/arxiv-2408.12463","url":null,"abstract":"A significant limitation of current smartphone-based eye-tracking algorithms\u0000is their low accuracy when applied to video-type visual stimuli, as they are\u0000typically trained on static images. Also, the increasing demand for real-time\u0000interactive applications like games, VR, and AR on smartphones requires\u0000overcoming the limitations posed by resource constraints such as limited\u0000computational power, battery life, and network bandwidth. Therefore, we\u0000developed two new smartphone eye-tracking techniques for video-type visuals by\u0000combining Convolutional Neural Networks (CNN) with two different Recurrent\u0000Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent\u0000Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean\u0000Square Error of 0.955cm and 1.091cm, respectively. To address the computational\u0000constraints of smartphones, we developed an edge intelligence architecture to\u0000enhance the performance of smartphone-based eye tracking. We applied various\u0000optimisation methods like quantisation and pruning to deep learning models for\u0000better energy, CPU, and memory usage on edge devices, focusing on real-time\u0000processing. Using model quantisation, the model inference time in the CNN+LSTM\u0000and CNN+GRU models was reduced by 21.72% and 19.50%, respectively, on edge\u0000devices.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RAO-SS: A Prototype of Run-time Auto-tuning Facility for Sparse Direct Solvers RAO-SS：稀疏直接求解器的运行时自动调整工具原型

arXiv - CS - Performance

Pub Date : 2024-08-21 DOI: arxiv-2408.11880

Takahiro Katagiri, Yoshinori Ishii, Hiroki Honda

In this paper, a run-time auto-tuning method for performance parametersaccording to input matrices is proposed. RAO-SS (Run-time Auto-tuning Optimizerfor Sparse Solvers), which is a prototype of auto-tuning software using theproposed method, is also evaluated. The RAO-SS is implemented with theAutopilot, which is middle-ware to support run-time auto-tuning with fuzzylogic function. The target numerical library is the SuperLU, which is a sparsedirect solver for linear equations. The result indicated that: (1) the speedupfactors of 1.2 for average and 3.6 for maximum to default executions wereobtained; (2) the software overhead of the Autopilot can be ignored in RAO-SS.

本文提出了一种根据输入矩阵对性能参数进行运行时自动调整的方法。本文还评估了 RAO-SS（稀疏求解器的运行时自动调整优化器），它是使用所提方法的自动调整软件原型。RAO-SS 是用 Autopilot 实现的，它是支持运行时自动调整的模糊逻辑功能的中间件。目标数值库是 SuperLU，它是线性方程的稀疏直接求解器。结果表明(1) 与默认执行相比，平均加速系数为 1.2，最大加速系数为 3.6；(2) RAO-SS 可以忽略自动驾驶仪的软件开销。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Performance

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀