IEEE Transactions on Emerging Topics in Computing最新文献_第10页

GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility 图形：在高速缓存中协调地进行收集和处理，实现高度并行性和灵活性

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-17 DOI: 10.1109/TETC.2023.3290683

Yiming Chen;Mingyen Lee;Guohao Dai;Mufeng Zhou;Nagadastagiri Challapalle;Tianyi Wang;Yao Yu;Yongpan Liu;Yu Wang;Huazhong Yang;Vijaykrishnan Narayanan;Xueqing Li

In-memory computing (IMC) has been proposed to overcome the von Neumann bottleneck in data-intensive applications. However, existing IMC solutions could not achieve both high parallelism and high flexibility, which limits their application in more general scenarios: As a highly parallel IMC design, the functionality of a MAC crossbar is limited to the matrix-vector multiplication; Another IMC method of logic-in-memory (LiM) is more flexible in supporting different logic functions, but has low parallelism. To improve the LiM parallelism, we are inspired by investigating how the single-instruction, multiple-data (SIMD) instruction set in conventional CPU could potentially help to expand the number of LiM operands in one cycle. The biggest challenge is the inefficiency in handling non-continuous data in parallel due to the SIMD limitation of (i) continuous address, (ii) limited cache bandwidth, and (iii) large full-resolution parallel computing overheads. This article presents GRAPHIC, the first reported in-memory SIMD architecture that solves the parallelism and irregular data access challenges in applying SIMD to LiM. GRAPHIC exploits content-addressable memory (CAM) and row-wise-accessible SRAM. By providing the in-situ, full-parallelism, and low-overhead operations of address search, cache read-compute-and-update, GRAPHIC accomplishes high-efficiency gather and aggregation with high parallelism, high energy efficiency, low latency, and low area overheads. Experiments in both continuous data access and irregular data pattern applications show an average speedup of 5x over iso-area AVX-like LiM, and 3-5x over the emerging CAM-based accelerators of CAPE and GaaS-X in advanced techniques.

为了克服数据密集型应用中的冯-诺依曼瓶颈，人们提出了内存计算（IMC）方案。然而，现有的 IMC 解决方案无法同时实现高并行性和高灵活性，这限制了它们在更广泛应用场景中的应用：作为一种高度并行的 IMC 设计，MAC 横条的功能仅限于矩阵-向量乘法；另一种 IMC 方法--内存逻辑（LiM）在支持不同逻辑功能方面更加灵活，但并行性较低。为了提高 LiM 的并行性，我们受到启发，研究传统 CPU 中的单指令多数据（SIMD）指令集如何可能有助于在一个周期内扩展 LiM 操作数。由于 SIMD 存在以下限制：（1）连续地址；（2）有限的高速缓存带宽；（3）较大的全分辨率并行计算开销，因此最大的挑战是并行处理非连续数据的效率低下。本文介绍了 GRAPHIC，这是首个报道的内存 SIMD 架构，它解决了将 SIMD 应用于 LiM 的并行性和不规则数据访问难题。GRAPHIC 利用了内容可寻址内存 (CAM) 和行向可访问 SRAM。通过提供原位、全并行和低开销的地址搜索、高速缓存读取-计算-更新操作，GRAPHIC 以高并行性、高能效、低延迟和低面积开销实现了高效率的收集和聚合。在连续数据访问和不规则数据模式应用中进行的实验表明，GRAPHIC 比等面积 AVX 类 LiM 平均提速 5 倍，比 CAPE 和 GaaS-X 等新兴基于 CAM 的高级技术加速器提速 3-5 倍。

{"title":"GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility","authors":"Yiming Chen;Mingyen Lee;Guohao Dai;Mufeng Zhou;Nagadastagiri Challapalle;Tianyi Wang;Yao Yu;Yongpan Liu;Yu Wang;Huazhong Yang;Vijaykrishnan Narayanan;Xueqing Li","doi":"10.1109/TETC.2023.3290683","DOIUrl":"10.1109/TETC.2023.3290683","url":null,"abstract":"In-memory computing (IMC) has been proposed to overcome the von Neumann bottleneck in data-intensive applications. However, existing IMC solutions could not achieve both high parallelism and high flexibility, which limits their application in more general scenarios: As a highly parallel IMC design, the functionality of a MAC crossbar is limited to the matrix-vector multiplication; Another IMC method of logic-in-memory (LiM) is more flexible in supporting different logic functions, but has low parallelism. To improve the LiM parallelism, we are inspired by investigating how the single-instruction, multiple-data (SIMD) instruction set in conventional CPU could potentially help to expand the number of LiM operands in one cycle. The biggest challenge is the inefficiency in handling non-continuous data in parallel due to the SIMD limitation of (i) continuous address, (ii) limited cache bandwidth, and (iii) large full-resolution parallel computing overheads. This article presents GRAPHIC, the first reported in-memory SIMD architecture that solves the parallelism and irregular data access challenges in applying SIMD to LiM. GRAPHIC exploits content-addressable memory (CAM) and row-wise-accessible SRAM. By providing the in-situ, full-parallelism, and low-overhead operations of address search, cache read-compute-and-update, GRAPHIC accomplishes high-efficiency gather and aggregation with high parallelism, high energy efficiency, low latency, and low area overheads. Experiments in both continuous data access and irregular data pattern applications show an average speedup of 5x over iso-area AVX-like LiM, and 3-5x over the emerging CAM-based accelerators of CAPE and GaaS-X in advanced techniques.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"84-96"},"PeriodicalIF":5.9,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New Construction of Balanced Codes Based on Weights of Data for DNA Storage 基于数据权重的 DNA 存储平衡编码新构建

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-13 DOI: 10.1109/TETC.2023.3293477

Xiaozhou Lu;Sunghwan Kim

As maintaining a proper balanced GC content is crucial for minimizing errors in DNA storage, constructing GC-balanced DNA codes has become an important research topic. In this article, we propose a novel code construction method based on the weight distribution of the data, which enables us to construct GC-balanced DNA codes. Additionally, we introduce a specific encoding process for both balanced and imbalanced data parts. One of the key differences between the proposed codes and existing codes is that the parity lengths of the proposed codes are variable depending on the data parts, while the parity lengths of existing codes remain fixed. To evaluate the effectiveness of the proposed codes, we compare their average parity lengths to those of existing codes. Our results demonstrate that the proposed codes have significantly shorter average parity lengths for DNA sequences with appropriate GC contents.

保持适当平衡的 GC 含量对于减少 DNA 存储中的错误至关重要，因此构建 GC 平衡 DNA 代码已成为一个重要的研究课题。在本文中，我们提出了一种基于数据权重分布的新型代码构建方法，它使我们能够构建 GC 平衡 DNA 代码。此外，我们还为平衡和不平衡数据部分引入了特定的编码过程。拟议代码与现有代码的主要区别之一是，拟议代码的奇偶校验长度可根据数据部分的不同而变化，而现有代码的奇偶校验长度则保持固定。为了评估拟议编码的有效性，我们将其平均奇偶校验长度与现有编码的平均奇偶校验长度进行了比较。结果表明，对于具有适当 GC 含量的 DNA 序列，建议的编码具有明显较短的平均奇偶校验长度。

引用次数: 0

CRAM-Based Acceleration for Intermittent Computing of Parallelizable Tasks 基于 CRAM 的可并行任务间歇计算加速技术

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-12 DOI: 10.1109/TETC.2023.3293426

Khakim Akhunov;Kasım Sinan Yıldırım

There is an emerging requirement for performing data-intensive parallel computations, e.g., machine-learning inference, locally on batteryless sensors. These devices are resource-constrained and operate intermittently due to the irregular energy availability in the environment. Intermittent execution might lead to several side effects that might prevent the correct execution of computational tasks. Even though recent studies proposed methods to cope with these side effects and execute these tasks correctly, they overlooked the efficient intermittent execution of parallelizable data-intensive machine-learning tasks. In this article, we present PiMCo—a novel programmable CRAM-based in-memory coprocessor that exploits the Processing In-Memory (PIM) paradigm and facilitates the power-failure resilient execution of parallelizable computational loads. Contrary to existing PIM solutions for intermittent computing, PiMCo promotes better programmability to accelerate a variety of parallelizable tasks. Our performance evaluation demonstrates that PiMCo improves the performance of existing low-power accelerators for intermittent computing by up to 8× and energy efficiency by up to 150×.

在本地无电池传感器上执行数据密集型并行计算（如机器学习推理）的需求不断出现。由于环境中的能源供应不稳定，这些设备受到资源限制，只能间歇运行。间歇性执行可能会导致一些副作用，妨碍计算任务的正确执行。尽管最近的研究提出了应对这些副作用并正确执行这些任务的方法，但它们忽略了可并行化的数据密集型机器学习任务的高效间歇执行。在本文中，我们介绍了 PiMCo--一种基于 CRAM 的新型可编程内存协处理器，它利用内存处理（PIM）范例，促进了可并行计算负载的电源故障弹性执行。与现有的间歇计算 PIM 解决方案不同，PiMCo 具有更好的可编程性，可加速各种可并行的任务。我们的性能评估结果表明，PiMCo 可将用于间歇计算的现有低功耗加速器的性能提高 8 倍，能效提高 150 倍。

{"title":"CRAM-Based Acceleration for Intermittent Computing of Parallelizable Tasks","authors":"Khakim Akhunov;Kasım Sinan Yıldırım","doi":"10.1109/TETC.2023.3293426","DOIUrl":"10.1109/TETC.2023.3293426","url":null,"abstract":"There is an emerging requirement for performing data-intensive parallel computations, e.g., machine-learning inference, locally on batteryless sensors. These devices are resource-constrained and operate intermittently due to the irregular energy availability in the environment. Intermittent execution might lead to several side effects that might prevent the correct execution of computational tasks. Even though recent studies proposed methods to cope with these side effects and execute these tasks correctly, they overlooked the efficient intermittent execution of parallelizable data-intensive machine-learning tasks. In this article, we present PiMCo—a novel programmable CRAM-based in-memory coprocessor that exploits the Processing In-Memory (PIM) paradigm and facilitates the power-failure resilient execution of parallelizable computational loads. Contrary to existing PIM solutions for intermittent computing, PiMCo promotes better programmability to accelerate a variety of parallelizable tasks. Our performance evaluation demonstrates that PiMCo improves the performance of existing low-power accelerators for intermittent computing by up to 8× and energy efficiency by up to 150×.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"48-59"},"PeriodicalIF":5.9,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive Applications 3DL-PIM：基于三维堆叠存储器的面向查找表的内存可编程处理架构，适用于数据密集型应用

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-12 DOI: 10.1109/TETC.2023.3293140

Purab Ranjan Sutradhar;Sathwika Bavikadi;Sai Manoj Pudukotai Dinakarrao;Mark A. Indovina;Amlan Ganguly

Memory-centric computing systems have demonstrated superior performance and efficiency in memory-intensive applications compared to state-of-the-art CPUs and GPUs. 3-D stacked DRAM architectures unlock higher I/O data bandwidth than the traditional 2-D memory architecture and therefore are better suited for incorporating memory-centric processors. However, merely integrating high-precision ALUs in the 3-D stacked memory does not ensure an optimized design since such a design can only achieve a limited utilization of the internal bandwidth of a memory chip and limited operational parallelization. To address this, we propose 3DL-PIM, a 3-D stacked memory-based Processing in Memory (PIM) architecture that locates a plurality of Look-up Table (LUT)-based low-footprint Processing Elements (PE) within the memory banks in order to achieve high parallel computing performance by maximizing data-bandwidth utilization. Instead of relying on the traditional logic-based ALUs, the PEs are formed by clustering a group of programmable LUTs and therefore can be programmed on-the-fly to perform various logic/arithmetic operations. Our simulations show that 3DL-PIM can achieve respectively up to 2.6× higher processing performance at 2.65× higher area efficiency compared to a state-of-the-art 3-D stacked memory-based accelerator.

与最先进的 CPU 和 GPU 相比，以内存为中心的计算系统在内存密集型应用中表现出卓越的性能和效率。与传统的二维内存架构相比，三维堆叠 DRAM 架构可释放更高的 I/O 数据带宽，因此更适合集成以内存为中心的处理器。然而，仅仅在三维堆叠内存中集成高精度 ALU 并不能确保设计的优化，因为这样的设计只能实现有限的内存芯片内部带宽利用率和有限的操作并行化。针对这一问题，我们提出了基于三维堆叠内存的内存中处理（PIM）架构--3DL-PIM，该架构将多个基于查找表（LUT）的低脚本处理单元（PE）置于内存库中，通过最大限度地利用数据带宽来实现高并行计算性能。PE 不依赖于传统的基于逻辑的 ALU，而是由一组可编程 LUT 组成，因此可以通过即时编程来执行各种逻辑/算术运算。我们的模拟结果表明，与最先进的基于内存的三维堆叠加速器相比，3DL-PIM 的处理性能最多可提高 2.6 倍，而面积效率却提高了 2.65 倍。

{"title":"3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive Applications","authors":"Purab Ranjan Sutradhar;Sathwika Bavikadi;Sai Manoj Pudukotai Dinakarrao;Mark A. Indovina;Amlan Ganguly","doi":"10.1109/TETC.2023.3293140","DOIUrl":"10.1109/TETC.2023.3293140","url":null,"abstract":"Memory-centric computing systems have demonstrated superior performance and efficiency in memory-intensive applications compared to state-of-the-art CPUs and GPUs. 3-D stacked DRAM architectures unlock higher I/O data bandwidth than the traditional 2-D memory architecture and therefore are better suited for incorporating memory-centric processors. However, merely integrating high-precision ALUs in the 3-D stacked memory does not ensure an optimized design since such a design can only achieve a limited utilization of the internal bandwidth of a memory chip and limited operational parallelization. To address this, we propose 3DL-PIM, a 3-D stacked memory-based Processing in Memory (PIM) architecture that locates a plurality of Look-up Table (LUT)-based low-footprint Processing Elements (PE) within the memory banks in order to achieve high parallel computing performance by maximizing data-bandwidth utilization. Instead of relying on the traditional logic-based ALUs, the PEs are formed by clustering a group of programmable LUTs and therefore can be programmed on-the-fly to perform various logic/arithmetic operations. Our simulations show that 3DL-PIM can achieve respectively up to 2.6× higher processing performance at 2.65× higher area efficiency compared to a state-of-the-art 3-D stacked memory-based accelerator.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"60-72"},"PeriodicalIF":5.9,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Graph-Incorporated Latent Factor Analysis Model for High-Dimensional and Sparse Data 针对高维稀疏数据的图并入潜在因素分析模型

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292866

Di Wu;Yi He;Xin Luo

A High-dimensional and sparse (HiDS) matrix is frequently encountered in Big Data-related applications such as e-commerce systems or wireless sensor networks. It is of great significance to perform highly accurate representation learning on an HiDS matrix due to the great desires of extracting latent knowledge from it. Latent factor analysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models directly perform such embeddings on an HiDS matrix without exploiting its hidden graph structures, resulting in accuracy loss. To aid this issue, this paper proposes a graph-incorporated latent factor analysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden high-order interaction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representation learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.

在电子商务系统或无线传感器网络等大数据相关应用中，经常会遇到高维稀疏（HiDS）矩阵。由于从 HiDS 矩阵中提取潜在知识的需求很大，因此对 HiDS 矩阵进行高精度表示学习具有重要意义。潜因分析（LFA）是解决这一问题的最有效和最高效的方法之一，它通过学习仅基于观察项的低秩嵌入来表示 HiDS 矩阵。然而，大多数现有的基于 LFA 的模型都是直接对 HiDS 矩阵进行嵌入，而没有利用其隐藏的图结构，从而导致准确率下降。为了解决这个问题，本文提出了一种图并入潜在因素分析（GLFA）模型。它采用了两方面的理念：1）构建一个图，用于识别 HiDS 矩阵所描述的节点间隐藏的高阶交互（HOI）；2）结合 HOI 精心设计一个循环 LFA 结构，从而提高结果模型的表征学习能力。在三个实际数据集上的实验结果表明，GLFA 在预测 HiDS 矩阵的缺失数据方面优于六个最先进的模型，这充分证明了它对 HiDS 数据的强大表征学习能力。

{"title":"A Graph-Incorporated Latent Factor Analysis Model for High-Dimensional and Sparse Data","authors":"Di Wu;Yi He;Xin Luo","doi":"10.1109/TETC.2023.3292866","DOIUrl":"10.1109/TETC.2023.3292866","url":null,"abstract":"A High-dimensional and \u0000<underline>s\u0000parse (HiDS) matrix is frequently encountered in Big Data-related applications such as e-commerce systems or wireless sensor networks. It is of great significance to perform highly accurate representation learning on an HiDS matrix due to the great desires of extracting latent knowledge from it. \u0000<underline>L\u0000atent \u0000<underline>f\u0000actor \u0000<underline>a\u0000nalysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models directly perform such embeddings on an HiDS matrix without exploiting its hidden graph structures, resulting in accuracy loss. To aid this issue, this paper proposes a \u0000<underline>g\u0000raph-incorporated \u0000<underline>l\u0000atent \u0000<underline>f\u0000actor \u0000<underline>a\u0000nalysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden \u0000<underline>h\u0000igh-\u0000<underline>o\u0000rder \u0000<underline>i\u0000nteraction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representation learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"907-917"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PISA: A Non-Volatile Processing-in-Sensor Accelerator for Imaging Systems PISA：用于成像系统的非易失性传感器处理加速器

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292251

Shaahin Angizi;Sepehr Tabrizchi;David Z. Pan;Arman Roohi

This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of

$sim$

1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by

$sim$

84% compared to a baseline.

本研究提出了一种 "传感器内处理加速器"（即 PISA），作为一种灵活、节能、高性能的解决方案，用于人工智能设备中的实时智能图像处理。PISA 利用传感器端的非易失性权重存储的新型计算像素，在二值化权重神经网络（BWNN）中本质上实现了粗粒度卷积操作。这大大降低了数据转换和传输到片外处理器的功耗。设计完成后，还需要一个比特近传感器内存计算单元来处理剩余的网络层。一旦检测到物体，PISA 就会切换到典型传感模式，仅使用一个近传感器处理单元捕捉图像，进行细粒度卷积。与基线 BWNN 模型相比，我们在 BWNN 加速上进行的电路到应用协同仿真结果表明，在粗粒度评估中，各种图像数据集的准确性略有下降，而 PISA 实现了 1000 帧的帧速率和 $sim$ 1.74 TOp/s/W 的效率。最后，与基线相比，PISA 大幅降低了 84% 的数据转换和传输能耗。

{"title":"PISA: A Non-Volatile Processing-in-Sensor Accelerator for Imaging Systems","authors":"Shaahin Angizi;Sepehr Tabrizchi;David Z. Pan;Arman Roohi","doi":"10.1109/TETC.2023.3292251","DOIUrl":"10.1109/TETC.2023.3292251","url":null,"abstract":"This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u0000 1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u0000 84% compared to a baseline.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"962-972"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

FINISH: Efficient and Scalable NMF-Based Federated Learning for Detecting Malware Activities FINISH：基于 NMF 的高效、可扩展的联合学习，用于检测恶意软件活动

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292924

Yu-Wei Chang;Hong-Yen Chen;Chansu Han;Tomohiro Morikawa;Takeshi Takahashi;Tsung-Nan Lin

5G networks with the vast number of devices pose security threats. Manual analysis of such extensive security data is complex. Dark-NMF can detect malware activities by monitoring unused IP address space, i.e., the darknet. However, the challenges of cooperative training for Dark-NMF are immense computational complexity with Big Data, communication overhead, and privacy concern with darknet sensor IP addresses. Darknet sensors can observe multivariate time series of packets from the same hosts, represented as intersecting columns in different data matrices. Previous works do not consider intersecting columns, losing a host's semantics because they do not aggregate the host's time series. To solve these problems, we proposed a federated IoT malware detection NMF for intersecting source hosts (FINISH) algorithm for offloading computing tasks to 5G multiaccess edge computing (MEC). The experiments show that FINISH is scalable to a data size with a shorter computational time and has a lower false positive detection performance than Dark-NMF. The comparison results demonstrate that FINISH has better computation and communication efficiency than related works and a short communication time, taking only 1/10 the execution time in a simulated 5G MEC. The experimental results can provide substantial insights into developing federated cybersecurity in the future.

拥有大量设备的 5G 网络会带来安全威胁。对如此广泛的安全数据进行人工分析非常复杂。Dark-NMF 可以通过监控未使用的 IP 地址空间（即暗网）来检测恶意软件活动。然而，Dark-NMF 在合作训练方面面临的挑战是大数据带来的巨大计算复杂性、通信开销以及暗网传感器 IP 地址带来的隐私问题。暗网传感器可以观察到来自同一主机的多变量时间序列数据包，这些数据包在不同的数据矩阵中表现为交叉列。以前的工作没有考虑到相交列，从而失去了主机的语义，因为它们没有汇总主机的时间序列。为了解决这些问题，我们提出了一种联合物联网恶意软件检测 NMF for intersecting source hosts（FINISH）算法，用于将计算任务卸载到 5G 多接入边缘计算（MEC）。实验表明，与Dark-NMF相比，FINISH可扩展到更大的数据规模，计算时间更短，误报检测性能更低。对比结果表明，与相关研究相比，FINISH 的计算和通信效率更高，通信时间更短，在模拟的 5G MEC 中仅需 1/10 的执行时间。实验结果可为未来联盟网络安全的发展提供重要启示。

{"title":"FINISH: Efficient and Scalable NMF-Based Federated Learning for Detecting Malware Activities","authors":"Yu-Wei Chang;Hong-Yen Chen;Chansu Han;Tomohiro Morikawa;Takeshi Takahashi;Tsung-Nan Lin","doi":"10.1109/TETC.2023.3292924","DOIUrl":"10.1109/TETC.2023.3292924","url":null,"abstract":"5G networks with the vast number of devices pose security threats. Manual analysis of such extensive security data is complex. Dark-NMF can detect malware activities by monitoring unused IP address space, i.e., the darknet. However, the challenges of cooperative training for Dark-NMF are immense computational complexity with Big Data, communication overhead, and privacy concern with darknet sensor IP addresses. Darknet sensors can observe multivariate time series of packets from the same hosts, represented as intersecting columns in different data matrices. Previous works do not consider intersecting columns, losing a host's semantics because they do not aggregate the host's time series. To solve these problems, we proposed a federated IoT malware detection NMF for intersecting source hosts (FINISH) algorithm for offloading computing tasks to 5G multiaccess edge computing (MEC). The experiments show that FINISH is scalable to a data size with a shorter computational time and has a lower false positive detection performance than Dark-NMF. The comparison results demonstrate that FINISH has better computation and communication efficiency than related works and a short communication time, taking only 1/10 the execution time in a simulated 5G MEC. The experimental results can provide substantial insights into developing federated cybersecurity in the future.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"934-949"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Maximizing Social Influence With Minimum Information Alteration 以最小的信息改动最大化社会影响力

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292384

Guan Wang;Weihua Li;Quan Bai;Edmund M-K Lai

With the rapid advancement of the Internet and social platforms, how to maximize the influence across popular online social networks has attracted great attention from both researchers and practitioners. Almost all the existing influence diffusion models assume that influence remains constant in the process of information spreading. However, in the real world, people tend to alternate information by attaching opinions or modifying the contents before spreading it. Namely, the meaning and idea of a message normally mutate in the process of influence diffusion. In this article, we investigate how to maximize the influence in online social platforms with a key consideration of suppressing the information alteration in the diffusion cascading process. We leverage deep learning models and knowledge graphs to present users’ personalised behaviours, i.e., actions after receiving a message. Furthermore, we investigate the information alteration in the process of influence diffusion. A novel seed selection algorithm is proposed to maximize the social influence without causing significant information alteration. Experimental results explicitly show the rationale of the proposed user behaviours deep learning model architecture and demonstrate the novel seeding algorithm's outstanding performance in both maximizing influence and retaining the influence originality.

随着互联网和社交平台的快速发展，如何在流行的在线社交网络中最大限度地扩大影响力引起了研究人员和从业人员的极大关注。几乎所有现有的影响力扩散模型都假设影响力在信息传播过程中保持不变。然而，在现实世界中，人们在传播信息之前往往会通过附加观点或修改内容来交替使用信息。也就是说，信息的含义和思想通常会在影响力扩散过程中发生变化。在本文中，我们研究了如何在网络社交平台中实现影响力最大化，其中的一个关键考虑因素是抑制扩散级联过程中的信息篡改。我们利用深度学习模型和知识图谱来呈现用户的个性化行为，即收到信息后的行动。此外，我们还研究了影响扩散过程中的信息改变。我们提出了一种新颖的种子选择算法，以在不造成重大信息改变的情况下最大化社会影响力。实验结果明确显示了所提出的用户行为深度学习模型架构的合理性，并证明了新型种子算法在最大化影响力和保留影响力原创性方面的出色表现。

{"title":"Maximizing Social Influence With Minimum Information Alteration","authors":"Guan Wang;Weihua Li;Quan Bai;Edmund M-K Lai","doi":"10.1109/TETC.2023.3292384","DOIUrl":"10.1109/TETC.2023.3292384","url":null,"abstract":"With the rapid advancement of the Internet and social platforms, how to maximize the influence across popular online social networks has attracted great attention from both researchers and practitioners. Almost all the existing influence diffusion models assume that influence remains constant in the process of information spreading. However, in the real world, people tend to alternate information by attaching opinions or modifying the contents before spreading it. Namely, the meaning and idea of a message normally mutate in the process of influence diffusion. In this article, we investigate how to maximize the influence in online social platforms with a key consideration of suppressing the information alteration in the diffusion cascading process. We leverage deep learning models and knowledge graphs to present users’ personalised behaviours, i.e., actions after receiving a message. Furthermore, we investigate the information alteration in the process of influence diffusion. A novel seed selection algorithm is proposed to maximize the social influence without causing significant information alteration. Experimental results explicitly show the rationale of the proposed user behaviours deep learning model architecture and demonstrate the novel seeding algorithm's outstanding performance in both maximizing influence and retaining the influence originality.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 2","pages":"419-431"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edgeless-GNN: Unsupervised Representation Learning for Edgeless Nodes 无边GNN：无边节点的无监督表示学习

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292240

Yong-Min Shin;Cong Tran;Won-Yong Shin;Xin Cao

We study the problem of embedding edgeless nodes such as users who newly enter the underlying network, while using graph neural networks (GNNs) widely studied for effective representation learning of graphs. Our study is motivated by the fact that GNNs cannot be straightforwardly adopted for our problem since message passing to such edgeless nodes having no connections is impossible. To tackle this challenge, we propose

$mathsf{Edgeless-GNN}$

, a novel inductive framework that enables GNNs to generate node embeddings even for edgeless nodes through unsupervised learning. Specifically, we start by constructing a proxy graph based on the similarity of node attributes as the GNN's computation graph defined by the underlying network. The known network structure is used to train model parameters, whereas a topology-aware loss function is established such that our model judiciously learns the network structure by encoding positive, negative, and second-order relations between nodes. For the edgeless nodes, we inductively infer embeddings by expanding the computation graph. By evaluating the performance of various downstream machine learning tasks, we empirically demonstrate that

$mathsf{Edgeless-GNN}$

exhibits (a) superiority over state-of-the-art inductive network embedding methods for edgeless nodes, (b) effectiveness of our topology-aware loss function, (c) robustness to incomplete node attributes, and (d) a linear scaling with the graph size.

我们研究了嵌入无边缘节点的问题，例如新进入底层网络的用户，同时使用广泛研究的图神经网络（GNN）进行图的有效表示学习。我们的研究是基于这样一个事实，即GNN不能直接用于我们的问题，因为消息传递到这种没有连接的无边缘节点是不可能的。为了应对这一挑战，我们提出了无边GNN，这是一种新的归纳框架，使GNN能够通过无监督学习生成节点嵌入，即使是无边节点。具体来说，我们首先基于节点属性的相似性构建一个代理图，作为底层网络定义的GNN计算图。已知的网络结构用于训练模型参数，而拓扑感知损失函数是以这样一种方式建立的，即我们的模型通过编码节点之间的正、负和二阶关系来明智地学习网络结构。对于无边节点，我们通过扩展计算图来归纳推断嵌入。通过评估各种下游机器学习任务的性能，我们从经验上证明，无边GNN表现出（a）对于无边节点而言，优于最先进的归纳网络嵌入方法，（b）我们的拓扑感知损失函数的有效性，（c）对不完全节点属性的鲁棒性，以及（d）随图大小的线性缩放。

{"title":"Edgeless-GNN: Unsupervised Representation Learning for Edgeless Nodes","authors":"Yong-Min Shin;Cong Tran;Won-Yong Shin;Xin Cao","doi":"10.1109/TETC.2023.3292240","DOIUrl":"10.1109/TETC.2023.3292240","url":null,"abstract":"We study the problem of embedding \u0000edgeless\u0000 nodes such as users who newly enter the underlying network, while using graph neural networks (GNNs) widely studied for effective representation learning of graphs. Our study is motivated by the fact that GNNs cannot be straightforwardly adopted for our problem since message passing to such edgeless nodes having no connections is impossible. To tackle this challenge, we propose \u0000<inline-formula><tex-math>$mathsf{Edgeless-GNN}$</tex-math></inline-formula>\u0000, a novel inductive framework that enables GNNs to generate node embeddings even for edgeless nodes through \u0000unsupervised learning\u0000. Specifically, we start by constructing a proxy graph based on the similarity of node attributes as the GNN's computation graph defined by the underlying network. The known network structure is used to train model parameters, whereas a \u0000topology-aware\u0000 loss function is established such that our model judiciously learns the network structure by encoding positive, negative, and second-order relations between nodes. For the edgeless nodes, we \u0000inductively\u0000 infer embeddings by expanding the computation graph. By evaluating the performance of various downstream machine learning tasks, we empirically demonstrate that \u0000<inline-formula><tex-math>$mathsf{Edgeless-GNN}$</tex-math></inline-formula>\u0000 exhibits (a) superiority over state-of-the-art inductive network embedding methods for edgeless nodes, (b) effectiveness of our topology-aware loss function, (c) robustness to incomplete node attributes, and (d) a linear scaling with the graph size.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"150-162"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44628489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Resource Allocation Optimization by Quantum Computing for Shared Use of Standalone IRS 利用量子计算优化资源分配，共享独立 IRS

IF 5.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing

Pub Date : 2023-07-11 DOI: 10.1109/TETC.2023.3292355

Takahiro Ohyama;Yuichi Kawamoto;Nei Kato

Intelligent reflecting surfaces (IRSs) have attracted attention as a technology that can considerably improve the energy utilization efficiency of sixth-generation (6G) mobile communication systems. IRSs enable control of propagation characteristics by adjusting the phase shift of each reflective element. However, designing the phase shift requires the acquisition of channel information for each reflective element, which is impractical from an overhead perspective. In addition, for multiple wireless network operators to share an IRS for communication, new infrastructure facilities and operational costs are required at each operator's end to control the IRS in a coordinated manner. Herein, we propose a wireless communication system using standalone IRSs to solve these problems. The standalone IRSs cover a wide area by periodically switching phase shifts, and each operator allocates radio resources according to their phase-shift switching. Furthermore, we derive a quadratic unconstrained binary optimization equation for the proposed system to optimize radio resource allocation using quantum computing. The results of computer simulations indicate that the proposed system and method can be used to achieve efficient communication in 6G mobile communication systems.

智能反射面（IRS）作为一种可显著提高第六代（6G）移动通信系统能量利用效率的技术，已引起人们的关注。IRS 可通过调整每个反射元件的相移来控制传播特性。然而，设计相移需要获取每个反射元件的信道信息，从开销角度看并不现实。此外，多个无线网络运营商要共享一个 IRS 进行通信，每个运营商都需要新的基础设施和运营成本，才能以协调的方式控制 IRS。在此，我们提出一种使用独立 IRS 的无线通信系统来解决这些问题。独立的 IRS 通过周期性地切换相移来覆盖广阔的区域，每个运营商根据其相移切换来分配无线电资源。此外，我们还为拟议系统推导了一个二次无约束二元优化方程，以利用量子计算优化无线电资源分配。计算机仿真结果表明，所提出的系统和方法可用于实现 6G 移动通信系统中的高效通信。

{"title":"Resource Allocation Optimization by Quantum Computing for Shared Use of Standalone IRS","authors":"Takahiro Ohyama;Yuichi Kawamoto;Nei Kato","doi":"10.1109/TETC.2023.3292355","DOIUrl":"10.1109/TETC.2023.3292355","url":null,"abstract":"Intelligent reflecting surfaces (IRSs) have attracted attention as a technology that can considerably improve the energy utilization efficiency of sixth-generation (6G) mobile communication systems. IRSs enable control of propagation characteristics by adjusting the phase shift of each reflective element. However, designing the phase shift requires the acquisition of channel information for each reflective element, which is impractical from an overhead perspective. In addition, for multiple wireless network operators to share an IRS for communication, new infrastructure facilities and operational costs are required at each operator's end to control the IRS in a coordinated manner. Herein, we propose a wireless communication system using standalone IRSs to solve these problems. The standalone IRSs cover a wide area by periodically switching phase shifts, and each operator allocates radio resources according to their phase-shift switching. Furthermore, we derive a quadratic unconstrained binary optimization equation for the proposed system to optimize radio resource allocation using quantum computing. The results of computer simulations indicate that the proposed system and method can be used to achieve efficient communication in 6G mobile communication systems.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"950-961"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1