Journal of Computer Science and Technology最新文献_第4页

Approximate Similarity-Aware Compression for Non-Volatile Main Memory 非易失性主存储器的近似相似意识压缩

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-2565-7

Zhang-Yu Chen, Yu Hua, Peng-Fei Zuo, Yuan-Yuan Sun, Yun-Cheng Guo

Image bitmaps, i.e., data containing pixels and visual perception, have been widely used in emerging applications for pixel operations while consuming lots of memory space and energy. Compared with legacy DRAM (dynamic random access memory), non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features of high density and intrinsic durability. However, writing NVMs suffers from higher energy consumption and latency compared with read accesses. Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps. We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels. By exploiting the pixel-level similarity, we propose SimCom, an approximate similarity-aware compression scheme in the NVM module controller, to efficiently compress data for each write access on-the-fly. The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs. The storage costs for small runs are further mitigated by reusing the least significant bits of base words. SimCom adaptively selects an appropriate compression mode for various bitmap formats, thus achieving an efficient trade-off between quality and memory performance. We implement SimCom on GEM5/zsim with NVMain and evaluate the performance with real-world image/video workloads. Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.

图像位图，即包含像素和视觉感知的数据，已广泛应用于新兴应用中的像素操作，但却消耗大量内存空间和能源。与传统的 DRAM（动态随机存取存储器）相比，非易失性存储器（NVM）具有高密度和固有耐用性等显著特点，适合用于位图存储。然而，与读取访问相比，写入 NVM 的能耗和延迟较高。由于位图的数据模式不规则且存在差异，NVM 控制器中现有的精确或近似压缩方案对位图的性能有限。由于相邻像素中的内容相似，我们在写入位图时会观察到像素级的相似性。通过利用像素级相似性，我们在 NVM 模块控制器中提出了近似相似性感知压缩方案 SimCom，为每次写入访问即时有效地压缩数据。SimCom 背后的理念是将连续的相似单词压缩成带运行的基词对。通过重复使用基字的最小有效位，进一步降低了小运行的存储成本。SimCom 可为各种位图格式自适应地选择适当的压缩模式，从而在质量和内存性能之间实现有效权衡。我们在带有 NVMain 的 GEM5/zsim 上实现了 SimCom，并利用真实世界的图像/视频工作负载对其性能进行了评估。结果表明，我们的 SimCom 在质量和性能之间实现了有效权衡，具有很高的功效和效率。

{"title":"Approximate Similarity-Aware Compression for Non-Volatile Main Memory","authors":"Zhang-Yu Chen, Yu Hua, Peng-Fei Zuo, Yuan-Yuan Sun, Yun-Cheng Guo","doi":"10.1007/s11390-023-2565-7","DOIUrl":"https://doi.org/10.1007/s11390-023-2565-7","url":null,"abstract":"Image bitmaps, i.e., data containing pixels and visual perception, have been widely used in emerging applications for pixel operations while consuming lots of memory space and energy. Compared with legacy DRAM (dynamic random access memory), non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features of high density and intrinsic durability. However, writing NVMs suffers from higher energy consumption and latency compared with read accesses. Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps. We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels. By exploiting the pixel-level similarity, we propose SimCom, an approximate similarity-aware compression scheme in the NVM module controller, to efficiently compress data for each write access on-the-fly. The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs. The storage costs for small runs are further mitigated by reusing the least significant bits of base words. SimCom adaptively selects an appropriate compression mode for various bitmap formats, thus achieving an efficient trade-off between quality and memory performance. We implement SimCom on GEM5/zsim with NVMain and evaluate the performance with real-world image/video workloads. Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"100 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identity-Preserving Adversarial Training for Robust Network Embedding 针对鲁棒网络嵌入的身份保护对抗训练

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-2256-4

Ke-Ting Cen, Hua-Wei Shen, Qi Cao, Bing-Bing Xu, Xue-Qi Cheng

Network embedding, as an approach to learning low-dimensional representations of nodes, has been proved extremely useful in many applications, e.g., node classification and link prediction. Unfortunately, existing network embedding models are vulnerable to random or adversarial perturbations, which may degrade the performance of network embedding when being applied to downstream tasks. To achieve robust network embedding, researchers introduce adversarial training to regularize the embedding learning process by training on a mixture of adversarial examples and original examples. However, existing methods generate adversarial examples heuristically, failing to guarantee the imperceptibility of generated adversarial examples, and thus limit the power of adversarial training. In this paper, we propose a novel method Identity-Preserving Adversarial Training (IPAT) for network embedding, which generates imperceptible adversarial examples with explicit identity-preserving regularization. We formalize such identity-preserving regularization as a multi-class classification problem where each node represents a class, and we encourage each adversarial example to be discriminated as the class of its original node. Extensive experimental results on real-world datasets demonstrate that our proposed IPAT method significantly improves the robustness of network embedding models and the generalization of the learned node representations on various downstream tasks.

网络嵌入作为一种学习节点低维表示的方法，已被证明在节点分类和链接预测等许多应用中极为有用。遗憾的是，现有的网络嵌入模型容易受到随机或对抗性扰动的影响，这可能会降低网络嵌入应用于下游任务时的性能。为了实现稳健的网络嵌入，研究人员引入了对抗训练，通过对抗示例和原始示例的混合训练来规范嵌入学习过程。然而，现有方法都是启发式地生成对抗示例，无法保证生成的对抗示例不被感知，从而限制了对抗训练的威力。在本文中，我们提出了一种用于网络嵌入的新方法--保身份对抗训练（IPAT），该方法通过明确的保身份正则化生成不可感知的对抗示例。我们将这种保身份正则化形式化为一个多类分类问题，其中每个节点代表一个类，我们鼓励将每个对抗示例判别为其原始节点的类。在真实世界数据集上的大量实验结果表明，我们提出的 IPAT 方法显著提高了网络嵌入模型的鲁棒性，以及所学节点表征在各种下游任务中的泛化能力。

{"title":"Identity-Preserving Adversarial Training for Robust Network Embedding","authors":"Ke-Ting Cen, Hua-Wei Shen, Qi Cao, Bing-Bing Xu, Xue-Qi Cheng","doi":"10.1007/s11390-023-2256-4","DOIUrl":"https://doi.org/10.1007/s11390-023-2256-4","url":null,"abstract":"Network embedding, as an approach to learning low-dimensional representations of nodes, has been proved extremely useful in many applications, e.g., node classification and link prediction. Unfortunately, existing network embedding models are vulnerable to random or adversarial perturbations, which may degrade the performance of network embedding when being applied to downstream tasks. To achieve robust network embedding, researchers introduce adversarial training to regularize the embedding learning process by training on a mixture of adversarial examples and original examples. However, existing methods generate adversarial examples heuristically, failing to guarantee the imperceptibility of generated adversarial examples, and thus limit the power of adversarial training. In this paper, we propose a novel method Identity-Preserving Adversarial Training (IPAT) for network embedding, which generates imperceptible adversarial examples with explicit identity-preserving regularization. We formalize such identity-preserving regularization as a multi-class classification problem where each node represents a class, and we encourage each adversarial example to be discriminated as the class of its original node. Extensive experimental results on real-world datasets demonstrate that our proposed IPAT method significantly improves the robustness of network embedding models and the generalization of the learned node representations on various downstream tasks.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"140 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Dynamic Client Selection for Fairness Guarantee in Heterogeneous Edge Computing 异构边缘计算中保证公平性的联合动态客户端选择

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-2972-9

Ying-Chi Mao, Li-Juan Shen, Jun Wu, Ping Ping, Jie Wu

Federated learning has emerged as a distributed learning paradigm by training at each client and aggregating at a parameter server. System heterogeneity hinders stragglers from responding to the server in time with huge communication costs. Although client grouping in federated learning can solve the straggler problem, the stochastic selection strategy in client grouping neglects the impact of data distribution within each group. Besides, current client grouping approaches make clients suffer unfair participation, leading to biased performances for different clients. In order to guarantee the fairness of client participation and mitigate biased local performances, we propose a federated dynamic client selection method based on data representativity (FedSDR). FedSDR clusters clients into groups correlated with their own local computational efficiency. To estimate the significance of client datasets, we design a novel data representativity evaluation scheme based on local data distribution. Furthermore, the two most representative clients in each group are selected to optimize the global model. Finally, the DYNAMIC-SELECT algorithm updates local computational efficiency and data representativity states to regroup clients after periodic average aggregation. Evaluations on real datasets show that FedSDR improves client participation by 27.4%, 37.9%, and 23.3% compared with FedAvg, TiFL, and FedSS, respectively, taking fairness into account in federated learning. In addition, FedSDR surpasses FedAvg, FedGS, and FedMS by 21.32%, 20.4%, and 6.90%, respectively, in local test accuracy variance, balancing the performance bias of the global model across clients.

联合学习是一种分布式学习模式，它在每个客户端进行训练，并在参数服务器上进行汇总。系统的异质性阻碍了散兵游勇及时响应服务器，通信成本巨大。虽然联合学习中的客户端分组可以解决游离者问题，但客户端分组中的随机选择策略忽略了每个组内数据分布的影响。此外，当前的客户端分组方法会使客户端遭受不公平参与，导致不同客户端的表现存在偏差。为了保证客户参与的公平性，减少局部性能偏差，我们提出了一种基于数据代表性的联合动态客户选择方法（FedSDR）。FedSDR 将客户端聚类为与其本地计算效率相关的组。为了评估客户端数据集的重要性，我们设计了一种基于本地数据分布的新型数据代表性评估方案。此外，我们还在每个组中选择了两个最具代表性的客户端来优化全局模型。最后，动态选择算法会更新本地计算效率和数据代表性状态，以便在定期平均聚合后重新分组客户。在真实数据集上进行的评估表明，与 FedAvg、TiFL 和 FedSS 相比，考虑到联合学习中的公平性，FedSDR 将客户参与度分别提高了 27.4%、37.9% 和 23.3%。此外，FedSDR 在本地测试准确率差异方面分别比 FedAvg、FedGS 和 FedMS 高出 21.32%、20.4% 和 6.90%，平衡了全局模型在客户端之间的性能偏差。

{"title":"Federated Dynamic Client Selection for Fairness Guarantee in Heterogeneous Edge Computing","authors":"Ying-Chi Mao, Li-Juan Shen, Jun Wu, Ping Ping, Jie Wu","doi":"10.1007/s11390-023-2972-9","DOIUrl":"https://doi.org/10.1007/s11390-023-2972-9","url":null,"abstract":"Federated learning has emerged as a distributed learning paradigm by training at each client and aggregating at a parameter server. System heterogeneity hinders stragglers from responding to the server in time with huge communication costs. Although client grouping in federated learning can solve the straggler problem, the stochastic selection strategy in client grouping neglects the impact of data distribution within each group. Besides, current client grouping approaches make clients suffer unfair participation, leading to biased performances for different clients. In order to guarantee the fairness of client participation and mitigate biased local performances, we propose a federated dynamic client selection method based on data representativity (FedSDR). FedSDR clusters clients into groups correlated with their own local computational efficiency. To estimate the significance of client datasets, we design a novel data representativity evaluation scheme based on local data distribution. Furthermore, the two most representative clients in each group are selected to optimize the global model. Finally, the DYNAMIC-SELECT algorithm updates local computational efficiency and data representativity states to regroup clients after periodic average aggregation. Evaluations on real datasets show that FedSDR improves client participation by 27.4%, 37.9%, and 23.3% compared with FedAvg, TiFL, and FedSS, respectively, taking fairness into account in federated learning. In addition, FedSDR surpasses FedAvg, FedGS, and FedMS by 21.32%, 20.4%, and 6.90%, respectively, in local test accuracy variance, balancing the performance bias of the global model across clients.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"13 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimal Context-Switching Data Race Detection with Dataflow Tracking 利用数据流跟踪进行最小上下文切换数据竞赛检测

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-1569-7

Long Zheng, Yang Li, Jie Xin, Hai-Feng Liu, Ran Zheng, Xiao-Fei Liao, Hai Jin

Data race is one of the most important concurrent anomalies in multi-threaded programs. Emerging constraint- based techniques are leveraged into race detection, which is able to find all the races that can be found by any other sound race detector. However, this constraint-based approach has serious limitations on helping programmers analyze and understand data races. First, it may report a large number of false positives due to the unrecognized dataflow propagation of the program. Second, it recommends a wide range of thread context switches to schedule the reported race (including the false one) whenever this race is exposed during the constraint-solving process. This ad hoc recommendation imposes too many context switches, which complicates the data race analysis. To address these two limitations in the state-of-the-art constraint-based race detection, this paper proposes DFTracker, an improved constraint-based race detector to recommend each data race with minimal thread context switches. Specifically, we reduce the false positives by analyzing and tracking the dataflow in the program. By this means, DFTracker thus reduces the unnecessary analysis of false race schedules. We further propose a novel algorithm to recommend an effective race schedule with minimal thread context switches for each data race. Our experimental results on the real applications demonstrate that 1) without removing any true data race, DFTracker effectively prunes false positives by 68% in comparison with the state-of-the-art constraint-based race detector; 2) DFTracker recommends as low as 2.6–8.3 (4.7 on average) thread context switches per data race in the real world, which is 81.6% fewer context switches per data race than the state-of-the-art constraint based race detector. Therefore, DFTracker can be used as an effective tool to understand the data race for programmers.

数据竞赛是多线程程序中最重要的并发异常之一。新出现的基于约束的技术被运用到竞赛检测中，它能发现任何其他健全的竞赛检测器所能发现的所有竞赛。然而，这种基于约束的方法在帮助程序员分析和理解数据竞赛方面存在严重的局限性。首先，由于程序的数据流传播未被识别，它可能会报告大量的误报。其次，只要在解决约束的过程中暴露出所报告的竞赛（包括误报），它就会建议进行大范围的线程上下文切换，以安排该竞赛。这种临时建议会带来过多的上下文切换，从而使数据竞赛分析复杂化。为了解决最先进的基于约束的竞赛检测中存在的这两个局限性，本文提出了一种改进的基于约束的竞赛检测器 DFTracker，它能以最少的线程上下文切换推荐每个数据竞赛。具体来说，我们通过分析和跟踪程序中的数据流来减少误报。通过这种方法，DFTracker 减少了对错误竞赛时间表的不必要分析。我们还进一步提出了一种新颖的算法，为每个数据竞赛推荐一个有效的竞赛时间表，并尽量减少线程上下文切换。我们在实际应用中的实验结果表明：1）与最先进的基于约束的竞赛检测器相比，在不移除任何真实数据竞赛的情况下，DFTracker 有效地清除了 68% 的误报；2）在现实世界中，DFTracker 为每个数据竞赛推荐了低至 2.6-8.3 次（平均 4.7 次）的线程上下文切换，与最先进的基于约束的竞赛检测器相比，每个数据竞赛的上下文切换次数减少了 81.6%。因此，DFTracker 可以作为程序员了解数据竞赛的有效工具。

{"title":"Minimal Context-Switching Data Race Detection with Dataflow Tracking","authors":"Long Zheng, Yang Li, Jie Xin, Hai-Feng Liu, Ran Zheng, Xiao-Fei Liao, Hai Jin","doi":"10.1007/s11390-023-1569-7","DOIUrl":"https://doi.org/10.1007/s11390-023-1569-7","url":null,"abstract":"Data race is one of the most important concurrent anomalies in multi-threaded programs. Emerging constraint- based techniques are leveraged into race detection, which is able to find all the races that can be found by any other sound race detector. However, this constraint-based approach has serious limitations on helping programmers analyze and understand data races. First, it may report a large number of false positives due to the unrecognized dataflow propagation of the program. Second, it recommends a wide range of thread context switches to schedule the reported race (including the false one) whenever this race is exposed during the constraint-solving process. This ad hoc recommendation imposes too many context switches, which complicates the data race analysis. To address these two limitations in the state-of-the-art constraint-based race detection, this paper proposes DFTracker, an improved constraint-based race detector to recommend each data race with minimal thread context switches. Specifically, we reduce the false positives by analyzing and tracking the dataflow in the program. By this means, DFTracker thus reduces the unnecessary analysis of false race schedules. We further propose a novel algorithm to recommend an effective race schedule with minimal thread context switches for each data race. Our experimental results on the real applications demonstrate that 1) without removing any true data race, DFTracker effectively prunes false positives by 68% in comparison with the state-of-the-art constraint-based race detector; 2) DFTracker recommends as low as 2.6–8.3 (4.7 on average) thread context switches per data race in the real world, which is 81.6% fewer context switches per data race than the state-of-the-art constraint based race detector. Therefore, DFTracker can be used as an effective tool to understand the data race for programmers.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"10 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines 基于风暴的分布式流处理引擎的在线不间断任务管理

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-021-1629-9

Zhou Zhang, Pei-Quan Jin, Xi-Ke Xie, Xiao-Liang Wang, Rui-Cheng Liu, Shou-Hong Wan

Most distributed stream processing engines (DSPEs) do not support online task management and cannot adapt to time-varying data flows. Recently, some studies have proposed online task deployment algorithms to solve this problem. However, these approaches do not guarantee the Quality of Service (QoS) when the task deployment changes at runtime, because the task migrations caused by the change of task deployments will impose an exorbitant cost. We study one of the most popular DSPEs, Apache Storm, and find out that when a task needs to be migrated, Storm has to stop the resource (implemented as a process of Worker in Storm) where the task is deployed. This will lead to the stop and restart of all tasks in the resource, resulting in the poor performance of task migrations. Aiming to solve this problem, in this paper, we propose N-Storm (Nonstop Storm), which is a task-resource decoupling DSPE. N-Storm allows tasks allocated to resources to be changed at runtime, which is implemented by a thread-level scheme for task migrations. Particularly, we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan. Thus, each resource can manage its tasks at runtime. Based on N-Storm, we further propose Online Task Deployment (OTD). Differing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migrations caused by a task re-deployment, OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources. We demonstrate that OTD can adapt to different kinds of applications including computation- and communication-intensive applications. The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87% of the performance degradation time, compared with Apache Storm and other state-of-the-art approaches. In addition, OTD can increase the average CPU usage by 51% for computation-intensive applications and reduce network communication costs by 88% for communication-intensive applications.

大多数分布式流处理引擎（DSPE）不支持在线任务管理，无法适应时变数据流。最近，一些研究提出了在线任务部署算法来解决这一问题。然而，当任务部署在运行时发生变化时，这些方法无法保证服务质量（QoS），因为任务部署变化引起的任务迁移将带来高昂的成本。我们研究了最流行的 DSPE 之一 Apache Storm，发现当任务需要迁移时，Storm 必须停止部署任务的资源（在 Storm 中以 Worker 进程的形式实现）。这将导致资源中的所有任务停止并重新启动，从而导致任务迁移性能低下。为了解决这个问题，我们在本文中提出了 N-Storm（Nonstop Storm），它是一种任务与资源解耦的 DSPE。N-Storm 允许在运行时更改分配给资源的任务，这是由线程级任务迁移方案实现的。特别是，我们在每个节点上添加了一个本地共享键/值存储，以便让资源了解分配计划的变化。因此，每个资源都能在运行时管理自己的任务。在 N-Storm 的基础上，我们进一步提出了在线任务部署（OTD）。传统的任务部署算法会一次性部署所有任务，而不考虑任务重新部署带来的任务迁移成本，与之不同的是，OTD 可以根据通信成本和资源的运行状态，逐步调整当前的任务部署，使之达到最优。我们证明了 OTD 能够适应不同类型的应用，包括计算密集型和通信密集型应用。在一个真实的 DSPE 集群上的实验结果表明，与 Apache Storm 和其他最先进的方法相比，N-Storm 可以避免系统停止，并节省多达 87% 的性能下降时间。此外，对于计算密集型应用，OTD 可以将 CPU 的平均使用率提高 51%，而对于通信密集型应用，则可以将网络通信成本降低 88%。

{"title":"Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines","authors":"Zhou Zhang, Pei-Quan Jin, Xi-Ke Xie, Xiao-Liang Wang, Rui-Cheng Liu, Shou-Hong Wan","doi":"10.1007/s11390-021-1629-9","DOIUrl":"https://doi.org/10.1007/s11390-021-1629-9","url":null,"abstract":"Most distributed stream processing engines (DSPEs) do not support online task management and cannot adapt to time-varying data flows. Recently, some studies have proposed online task deployment algorithms to solve this problem. However, these approaches do not guarantee the Quality of Service (QoS) when the task deployment changes at runtime, because the task migrations caused by the change of task deployments will impose an exorbitant cost. We study one of the most popular DSPEs, Apache Storm, and find out that when a task needs to be migrated, Storm has to stop the resource (implemented as a process of Worker in Storm) where the task is deployed. This will lead to the stop and restart of all tasks in the resource, resulting in the poor performance of task migrations. Aiming to solve this problem, in this paper, we propose N-Storm (Nonstop Storm), which is a task-resource decoupling DSPE. N-Storm allows tasks allocated to resources to be changed at runtime, which is implemented by a thread-level scheme for task migrations. Particularly, we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan. Thus, each resource can manage its tasks at runtime. Based on N-Storm, we further propose Online Task Deployment (OTD). Differing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migrations caused by a task re-deployment, OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources. We demonstrate that OTD can adapt to different kinds of applications including computation- and communication-intensive applications. The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87% of the performance degradation time, compared with Apache Storm and other state-of-the-art approaches. In addition, OTD can increase the average CPU usage by 51% for computation-intensive applications and reduce network communication costs by 88% for communication-intensive applications.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"35 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VPI: Vehicle Programming Interface for Vehicle Computing VPI：车辆计算编程接口

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-024-4035-2

Bao-Fu Wu, Ren Zhong, Yuxin Wang, Jian Wan, Ji-Lin Zhang, Weisong Shi

The emergence of software-defined vehicles (SDVs), combined with autonomous driving technologies, has enabled a new era of vehicle computing (VC), where vehicles serve as a mobile computing platform. However, the interdisciplinary complexities of automotive systems and diverse technological requirements make developing applications for autonomous vehicles challenging. To simplify the development of applications running on SDVs, we propose a comprehensive suite of vehicle programming interfaces (VPIs). In this study, we rigorously explore the nuanced requirements for application development within the realm of VC, centering our analysis on the architectural intricacies of the Open Vehicular Data Analytics Platform (OpenVDAP). We then detail our creation of a comprehensive suite of standardized VPIs, spanning five critical categories: Hardware, Data, Computation, Service, and Management, to address these evolving programming requirements. To validate the design of VPIs, we conduct experiments using the indoor autonomous vehicle, Zebra, and develop the OpenVDAP prototype system. By comparing it with the industry-influential AUTOSAR interface, our VPIs demonstrate significant enhancements in programming efficiency, marking an important advancement in the field of SDV application development. We also show a case study and evaluate its performance. Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC. They meet both current and future technological demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.

软件定义汽车（SDV）的出现与自动驾驶技术相结合，开创了汽车计算（VC）的新时代，即把汽车作为移动计算平台。然而，汽车系统跨学科的复杂性和多样化的技术要求使得为自动驾驶汽车开发应用软件充满挑战。为了简化在 SDV 上运行的应用程序的开发，我们提出了一套全面的车辆编程接口 (VPI)。在本研究中，我们以开放式车载数据分析平台（OpenVDAP）错综复杂的架构为分析中心，严格探讨了 VC 领域内应用程序开发的细微要求。然后，我们详细介绍了我们创建的一整套标准化 VPI，涵盖五个关键类别：硬件、数据、计算、服务和管理，以满足这些不断变化的编程要求。为了验证 VPI 的设计，我们使用室内自动驾驶汽车斑马进行了实验，并开发了 OpenVDAP 原型系统。通过与业界通用的 AUTOSAR 界面进行比较，我们的 VPI 显著提高了编程效率，标志着 SDV 应用开发领域的重要进步。我们还展示了一个案例研究，并对其性能进行了评估。我们的工作表明，VPI 显著提高了在 VC 上开发应用程序的效率。它们能满足当前和未来的技术需求，推动软件定义的汽车行业走向更加互联和智能的未来。

{"title":"VPI: Vehicle Programming Interface for Vehicle Computing","authors":"Bao-Fu Wu, Ren Zhong, Yuxin Wang, Jian Wan, Ji-Lin Zhang, Weisong Shi","doi":"10.1007/s11390-024-4035-2","DOIUrl":"https://doi.org/10.1007/s11390-024-4035-2","url":null,"abstract":"The emergence of software-defined vehicles (SDVs), combined with autonomous driving technologies, has enabled a new era of vehicle computing (VC), where vehicles serve as a mobile computing platform. However, the interdisciplinary complexities of automotive systems and diverse technological requirements make developing applications for autonomous vehicles challenging. To simplify the development of applications running on SDVs, we propose a comprehensive suite of vehicle programming interfaces (VPIs). In this study, we rigorously explore the nuanced requirements for application development within the realm of VC, centering our analysis on the architectural intricacies of the Open Vehicular Data Analytics Platform (OpenVDAP). We then detail our creation of a comprehensive suite of standardized VPIs, spanning five critical categories: Hardware, Data, Computation, Service, and Management, to address these evolving programming requirements. To validate the design of VPIs, we conduct experiments using the indoor autonomous vehicle, Zebra, and develop the OpenVDAP prototype system. By comparing it with the industry-influential AUTOSAR interface, our VPIs demonstrate significant enhancements in programming efficiency, marking an important advancement in the field of SDV application development. We also show a case study and evaluate its performance. Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC. They meet both current and future technological demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"13 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

10-Million Atoms Simulation of First-Principle Package LS3DF 第一原理封装 LS3DF 的千万原子模拟

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-3011-6

Yu-Jin Yan, Hai-Bo Li, Tong Zhao, Lin-Wang Wang, Lin Shi, Tao Liu, Guang-Ming Tan, Wei-Le Jia, Ning-Hui Sun

The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with accelerators. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.

半导体器件仿真需求的不断增长对大规模电子结构计算提出了巨大挑战。在各种方法中，线性扩展三维片段（LS3DF）方法在大规模模拟中表现出优异的可扩展性。在算法和系统级优化的基础上，我们提出了在配备加速器的国产异构超级计算机上实现 LS3DF 的高扩展性和高效率。在算法优化方面，对原有的全波段共轭梯度算法进行了改进，以达到更快的收敛速度，并采用混合精度计算提高整体效率。在系统级优化方面，用粗粒度并行方法取代了原来的双层并行结构。此外，还提出了多流、内核融合和去除冗余计算等优化策略，以进一步提高异构计算机计算能力的利用率。因此，经过优化的 LS3DF 可以扩展到千万硅原子系统，达到 34.8 PFLOPS 的峰值性能（峰值的 21.2%）。所有这些改进都可适用于下一代超级计算机，以进行更大规模的模拟。

{"title":"10-Million Atoms Simulation of First-Principle Package LS3DF","authors":"Yu-Jin Yan, Hai-Bo Li, Tong Zhao, Lin-Wang Wang, Lin Shi, Tao Liu, Guang-Ming Tan, Wei-Le Jia, Ning-Hui Sun","doi":"10.1007/s11390-023-3011-6","DOIUrl":"https://doi.org/10.1007/s11390-023-3011-6","url":null,"abstract":"The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with accelerators. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"65 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMEC: Scene Mining for E-Commerce SMEC：电子商务场景挖掘

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-021-1277-0

Gang Wang, Xiang Li, Zi-Yi Guo, Da-Wei Yin, Shuai Ma

Scene-based recommendation has proven its usefulness in E-commerce, by recommending commodities based on a given scene. However, scenes are typically unknown in advance, which necessitates scene discovery for E-commerce. In this article, we study scene discovery for E-commerce systems. We first formalize a scene as a set of commodity categories that occur simultaneously and frequently in real-world situations, and model an E-commerce platform as a heterogeneous information network (HIN), whose nodes and links represent different types of objects and different types of relationships between objects, respectively. We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN. To solve the problem, we propose a non-negative matrix factorization based method SMEC (Scene Mining for E-Commerce), and theoretically prove its convergence. Using six real-world E-commerce datasets, we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods, and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.

基于场景的推荐已被证明在电子商务中非常有用，它可以根据给定的场景推荐商品。然而，场景通常是事先未知的，这就需要为电子商务发现场景。本文将研究电子商务系统的场景发现。我们首先将场景形式化为一组在现实世界中同时频繁出现的商品类别，并将电子商务平台建模为一个异构信息网络（HIN），其节点和链接分别代表不同类型的对象和对象之间不同类型的关系。然后，我们将电子商务的场景挖掘问题表述为一个无监督学习问题，即在 HIN 中找到商品类别的重叠聚类。为了解决这个问题，我们提出了一种基于非负矩阵因式分解的方法 SMEC（电子商务场景挖掘），并从理论上证明了它的收敛性。最后，我们利用六个真实世界的电子商务数据集进行了广泛的实验研究，将 SMEC 与其他 13 种方法进行了对比评估，结果表明 SMEC 在各种评估指标上始终优于其竞争对手。

{"title":"SMEC: Scene Mining for E-Commerce","authors":"Gang Wang, Xiang Li, Zi-Yi Guo, Da-Wei Yin, Shuai Ma","doi":"10.1007/s11390-021-1277-0","DOIUrl":"https://doi.org/10.1007/s11390-021-1277-0","url":null,"abstract":"Scene-based recommendation has proven its usefulness in E-commerce, by recommending commodities based on a given scene. However, scenes are typically unknown in advance, which necessitates scene discovery for E-commerce. In this article, we study scene discovery for E-commerce systems. We first formalize a scene as a set of commodity categories that occur simultaneously and frequently in real-world situations, and model an E-commerce platform as a heterogeneous information network (HIN), whose nodes and links represent different types of objects and different types of relationships between objects, respectively. We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN. To solve the problem, we propose a non-negative matrix factorization based method SMEC (Scene Mining for E-Commerce), and theoretically prove its convergence. Using six real-world E-commerce datasets, we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods, and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"18 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DIR: Dynamic Request Interleaving for Improving the Read Performance of Aged Solid-State Drives DIR：动态请求交错提高老化固态硬盘的读取性能

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-1601-y

Shi-Qiang Nie, Chi Zhang, Wei-Guo Wu

Triple-level cell (TLC) NAND flash is increasingly adopted to build solid-state drives (SSDs) for modern computer systems. While TLC NAND flash effectively improves storage density, it faces severe reliability issues; in particular, the pages exhibit different raw bit error rates (RBERs). Integrating strong low-density parity-check (LDPC) code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages. The straightforward idea is that dispersing page-size data across several pages in different types can achieve a lower average RBER and reduce the read latency. However, directly implementing this simple idea into flash translation layer (FTL) induces the read amplification issue as one logic page residing in more than one physical page brings several read operations. In this paper, we propose the Dynamic Request Interleaving (DIR) technology for improving the performance of TLC NAND flash-based SSDs, in particular, the aged ones with large RBERs. DIR exploits the observation that the latency of an I/O request is determined, without considering the queuing time, by the access of the slowest device page, i.e., the page that has the highest RBER. By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs, DIR effectively reduces the number of read retries for LDPC with limited read amplification. To meet the requirement of allocating hybrid page types for interleaved data, we also design a page-interleaving friendly page allocation scheme, which splits all the planes into multi-plane regions for storing the interleaved data and single-plane regions for storing the normal data. The pages in the multi-plane region can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue. Based on the DIR scheme and the proposed page allocation scheme, we build DIR-enable FTL, which integrates the proposed schemes into the FTL with some modifications. Our experimental results show that adopting DIR in aged SSDs exploits nearly 33% locality from I/O requests and, on average, reduces 43% read latency over conventional aged SSDs.

现代计算机系统越来越多地采用三层单元（TLC）NAND 闪存来制造固态硬盘（SSD）。虽然 TLC NAND 闪存能有效提高存储密度，但却面临着严重的可靠性问题，尤其是页面显示出不同的原始比特错误率（RBER）。集成强低密度奇偶校验（LDPC）码有助于提高可靠性，但由于对较差页面进行多次读取重试，会导致读取延迟时间延长，并与读取延迟时间成正比。直截了当的想法是，将页面大小的数据分散到不同类型的多个页面中，可以实现较低的平均 RBER 值，并减少读取延迟。然而，直接在闪存转换层（FTL）中实现这一简单想法会导致读取放大问题，因为一个逻辑页驻留在多个物理页中，会带来多个读取操作。在本文中，我们提出了动态请求交错（DIR）技术，以提高基于 TLC NAND 闪存的固态硬盘的性能，尤其是具有较大 RBER 的老式固态硬盘。DIR 利用了一个观察结果，即在不考虑排队时间的情况下，I/O 请求的延迟取决于对最慢设备页面（即 RBER 最高的页面）的访问。通过将具有高定位性的连续逻辑页分组，并将其编码数据交错在具有不同 RBER 的不同类型设备页中，DIR 可以有效减少具有有限读取放大功能的 LDPC 的读取重试次数。为了满足为交错数据分配混合页面类型的要求，我们还设计了一种页面交错友好型页面分配方案，该方案将所有平面划分为多平面区域和单平面区域，前者用于存储交错数据，后者用于存储正常数据。多平面区域中的页面可以通过拟议的多平面指令并行读/写，避免了读放大问题。在 DIR 方案和建议的页面分配方案的基础上，我们构建了 DIR-enable FTL，将建议的方案进行一些修改后集成到 FTL 中。我们的实验结果表明，在老化的固态硬盘中采用 DIR 可以从 I/O 请求中利用近 33% 的本地性，与传统的老化固态硬盘相比，平均可减少 43% 的读取延迟。

{"title":"DIR: Dynamic Request Interleaving for Improving the Read Performance of Aged Solid-State Drives","authors":"Shi-Qiang Nie, Chi Zhang, Wei-Guo Wu","doi":"10.1007/s11390-023-1601-y","DOIUrl":"https://doi.org/10.1007/s11390-023-1601-y","url":null,"abstract":"Triple-level cell (TLC) NAND flash is increasingly adopted to build solid-state drives (SSDs) for modern computer systems. While TLC NAND flash effectively improves storage density, it faces severe reliability issues; in particular, the pages exhibit different raw bit error rates (RBERs). Integrating strong low-density parity-check (LDPC) code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages. The straightforward idea is that dispersing page-size data across several pages in different types can achieve a lower average RBER and reduce the read latency. However, directly implementing this simple idea into flash translation layer (FTL) induces the read amplification issue as one logic page residing in more than one physical page brings several read operations. In this paper, we propose the Dynamic Request Interleaving (DIR) technology for improving the performance of TLC NAND flash-based SSDs, in particular, the aged ones with large RBERs. DIR exploits the observation that the latency of an I/O request is determined, without considering the queuing time, by the access of the slowest device page, i.e., the page that has the highest RBER. By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs, DIR effectively reduces the number of read retries for LDPC with limited read amplification. To meet the requirement of allocating hybrid page types for interleaved data, we also design a page-interleaving friendly page allocation scheme, which splits all the planes into multi-plane regions for storing the interleaved data and single-plane regions for storing the normal data. The pages in the multi-plane region can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue. Based on the DIR scheme and the proposed page allocation scheme, we build DIR-enable FTL, which integrates the proposed schemes into the FTL with some modifications. Our experimental results show that adopting DIR in aged SSDs exploits nearly 33% locality from I/O requests and, on average, reduces 43% read latency over conventional aged SSDs.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"36 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on General-Purpose Brain-Inspired Computing Systems 通用脑启发计算系统研究

IF 1.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology

Pub Date : 2024-01-30 DOI: 10.1007/s11390-023-4002-3

Peng Qu, Xing-Long Ji, Jia-Jie Chen, Meng Pang, Yu-Chen Li, Xiao-Yi Liu, You-Hui Zhang

Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence (AGI), and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips, basic software and hardware, and algorithms/applications that embody this technology. While the system is developing rapidly, it faces various challenges and opportunities brought by interdisciplinary research, including the issue of software and hardware fragmentation. This paper analyzes the status quo of brain-inspired computing systems. Enlightened by some design principle and methodology of general-purpose computers, it is proposed to construct “general-purpose” brain-inspired computing systems. A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware, which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architectures. Further, this paper introduces our recent work in these aspects, including the ANN (artificial neural network)/SNN (spiking neural network) development tools, the hardware agnostic compilation infrastructure, and the chip micro-architecture with high flexibility of programming and high performance; these studies show that the “general-purpose” system can remarkably improve the efficiency of application development and enhance the productivity of basic software, thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications. We believe that this is the key to the collaborative research and development, and the evolution of applications, basic software and chips in this field, and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.

脑启发计算是一种借鉴脑科学原理、面向人工通用智能（AGI）高效发展的新技术，脑启发计算系统是由神经形态芯片、基础软硬件和体现该技术的算法/应用组成的分层系统。该系统在快速发展的同时，也面临着跨学科研究带来的各种挑战和机遇，其中就包括软硬件碎片化问题。本文分析了脑启发计算系统的现状。在通用计算机一些设计原理和方法的启发下，提出构建 "通用型 "脑启发计算系统。通用脑启发计算系统是指基于软硬件解耦设计理念构建的脑启发计算层次结构，可以灵活支持各种脑启发计算应用和不同架构的神经形态芯片。此外，本文还介绍了我们最近在这些方面的工作，包括 ANN（人工神经网络）/SNN（尖峰神经网络）开发工具、硬件无关的编译基础架构，以及具有高编程灵活性和高性能的芯片微体系结构；这些研究表明，"通用 "系统可以显著提高应用开发的效率，提高基础软件的生产力，从而有利于加速各种脑启发算法和应用的发展。我们相信，这是该领域应用、基础软件和芯片协同研发和演进的关键，有利于构建良好的脑启发计算软硬件生态系统。

{"title":"Research on General-Purpose Brain-Inspired Computing Systems","authors":"Peng Qu, Xing-Long Ji, Jia-Jie Chen, Meng Pang, Yu-Chen Li, Xiao-Yi Liu, You-Hui Zhang","doi":"10.1007/s11390-023-4002-3","DOIUrl":"https://doi.org/10.1007/s11390-023-4002-3","url":null,"abstract":"Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence (AGI), and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips, basic software and hardware, and algorithms/applications that embody this technology. While the system is developing rapidly, it faces various challenges and opportunities brought by interdisciplinary research, including the issue of software and hardware fragmentation. This paper analyzes the status quo of brain-inspired computing systems. Enlightened by some design principle and methodology of general-purpose computers, it is proposed to construct “general-purpose” brain-inspired computing systems. A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware, which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architectures. Further, this paper introduces our recent work in these aspects, including the ANN (artificial neural network)/SNN (spiking neural network) development tools, the hardware agnostic compilation infrastructure, and the chip micro-architecture with high flexibility of programming and high performance; these studies show that the “general-purpose” system can remarkably improve the efficiency of application development and enhance the productivity of basic software, thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications. We believe that this is the key to the collaborative research and development, and the evolution of applications, basic software and chips in this field, and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"42 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0