The emergence of software-defined vehicles (SDVs), combined with autonomous driving technologies, has enabled a new era of vehicle computing (VC), where vehicles serve as a mobile computing platform. However, the interdisciplinary complexities of automotive systems and diverse technological requirements make developing applications for autonomous vehicles challenging. To simplify the development of applications running on SDVs, we propose a comprehensive suite of vehicle programming interfaces (VPIs). In this study, we rigorously explore the nuanced requirements for application development within the realm of VC, centering our analysis on the architectural intricacies of the Open Vehicular Data Analytics Platform (OpenVDAP). We then detail our creation of a comprehensive suite of standardized VPIs, spanning five critical categories: Hardware, Data, Computation, Service, and Management, to address these evolving programming requirements. To validate the design of VPIs, we conduct experiments using the indoor autonomous vehicle, Zebra, and develop the OpenVDAP prototype system. By comparing it with the industry-influential AUTOSAR interface, our VPIs demonstrate significant enhancements in programming efficiency, marking an important advancement in the field of SDV application development. We also show a case study and evaluate its performance. Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC. They meet both current and future technological demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.
{"title":"VPI: Vehicle Programming Interface for Vehicle Computing","authors":"Bao-Fu Wu, Ren Zhong, Yuxin Wang, Jian Wan, Ji-Lin Zhang, Weisong Shi","doi":"10.1007/s11390-024-4035-2","DOIUrl":"https://doi.org/10.1007/s11390-024-4035-2","url":null,"abstract":"<p>The emergence of software-defined vehicles (SDVs), combined with autonomous driving technologies, has enabled a new era of vehicle computing (VC), where vehicles serve as a mobile computing platform. However, the interdisciplinary complexities of automotive systems and diverse technological requirements make developing applications for autonomous vehicles challenging. To simplify the development of applications running on SDVs, we propose a comprehensive suite of vehicle programming interfaces (VPIs). In this study, we rigorously explore the nuanced requirements for application development within the realm of VC, centering our analysis on the architectural intricacies of the Open Vehicular Data Analytics Platform (OpenVDAP). We then detail our creation of a comprehensive suite of standardized VPIs, spanning five critical categories: Hardware, Data, Computation, Service, and Management, to address these evolving programming requirements. To validate the design of VPIs, we conduct experiments using the indoor autonomous vehicle, Zebra, and develop the OpenVDAP prototype system. By comparing it with the industry-influential AUTOSAR interface, our VPIs demonstrate significant enhancements in programming efficiency, marking an important advancement in the field of SDV application development. We also show a case study and evaluate its performance. Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC. They meet both current and future technological demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-023-3011-6
Yu-Jin Yan, Hai-Bo Li, Tong Zhao, Lin-Wang Wang, Lin Shi, Tao Liu, Guang-Ming Tan, Wei-Le Jia, Ning-Hui Sun
The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with accelerators. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.
{"title":"10-Million Atoms Simulation of First-Principle Package LS3DF","authors":"Yu-Jin Yan, Hai-Bo Li, Tong Zhao, Lin-Wang Wang, Lin Shi, Tao Liu, Guang-Ming Tan, Wei-Le Jia, Ning-Hui Sun","doi":"10.1007/s11390-023-3011-6","DOIUrl":"https://doi.org/10.1007/s11390-023-3011-6","url":null,"abstract":"<p>The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various methods, the linearly scaling three-dimensional fragment (LS3DF) method exhibits excellent scalability in large-scale simulations. Based on algorithmic and system-level optimizations, we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with accelerators. In terms of algorithmic optimizations, the original all-band conjugate gradient algorithm is refined to achieve faster convergence, and mixed precision computing is adopted to increase overall efficiency. In terms of system-level optimizations, the original two-layer parallel structure is replaced by a coarse-grained parallel method. Optimization strategies such as multi-stream, kernel fusion, and redundant computation removal are proposed to increase further utilization of the computational power provided by the heterogeneous machines. As a result, our optimized LS3DF can scale to a 10-million silicon atoms system, attaining a peak performance of 34.8 PFLOPS (21.2% of the peak). All the improvements can be adapted to the next-generation supercomputers for larger simulations.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-021-1277-0
Gang Wang, Xiang Li, Zi-Yi Guo, Da-Wei Yin, Shuai Ma
Scene-based recommendation has proven its usefulness in E-commerce, by recommending commodities based on a given scene. However, scenes are typically unknown in advance, which necessitates scene discovery for E-commerce. In this article, we study scene discovery for E-commerce systems. We first formalize a scene as a set of commodity categories that occur simultaneously and frequently in real-world situations, and model an E-commerce platform as a heterogeneous information network (HIN), whose nodes and links represent different types of objects and different types of relationships between objects, respectively. We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN. To solve the problem, we propose a non-negative matrix factorization based method SMEC (Scene Mining for E-Commerce), and theoretically prove its convergence. Using six real-world E-commerce datasets, we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods, and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.
基于场景的推荐已被证明在电子商务中非常有用,它可以根据给定的场景推荐商品。然而,场景通常是事先未知的,这就需要为电子商务发现场景。本文将研究电子商务系统的场景发现。我们首先将场景形式化为一组在现实世界中同时频繁出现的商品类别,并将电子商务平台建模为一个异构信息网络(HIN),其节点和链接分别代表不同类型的对象和对象之间不同类型的关系。然后,我们将电子商务的场景挖掘问题表述为一个无监督学习问题,即在 HIN 中找到商品类别的重叠聚类。为了解决这个问题,我们提出了一种基于非负矩阵因式分解的方法 SMEC(电子商务场景挖掘),并从理论上证明了它的收敛性。最后,我们利用六个真实世界的电子商务数据集进行了广泛的实验研究,将 SMEC 与其他 13 种方法进行了对比评估,结果表明 SMEC 在各种评估指标上始终优于其竞争对手。
{"title":"SMEC: Scene Mining for E-Commerce","authors":"Gang Wang, Xiang Li, Zi-Yi Guo, Da-Wei Yin, Shuai Ma","doi":"10.1007/s11390-021-1277-0","DOIUrl":"https://doi.org/10.1007/s11390-021-1277-0","url":null,"abstract":"<p>Scene-based recommendation has proven its usefulness in E-commerce, by recommending commodities based on a given scene. However, scenes are typically unknown in advance, which necessitates scene discovery for E-commerce. In this article, we study scene discovery for E-commerce systems. We first formalize a scene as a set of commodity categories that occur simultaneously and frequently in real-world situations, and model an E-commerce platform as a heterogeneous information network (HIN), whose nodes and links represent different types of objects and different types of relationships between objects, respectively. We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN. To solve the problem, we propose a non-negative matrix factorization based method SMEC (Scene Mining for E-Commerce), and theoretically prove its convergence. Using six real-world E-commerce datasets, we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods, and show that SMEC consistently outperforms its competitors with regard to various evaluation measures.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-023-1601-y
Shi-Qiang Nie, Chi Zhang, Wei-Guo Wu
Triple-level cell (TLC) NAND flash is increasingly adopted to build solid-state drives (SSDs) for modern computer systems. While TLC NAND flash effectively improves storage density, it faces severe reliability issues; in particular, the pages exhibit different raw bit error rates (RBERs). Integrating strong low-density parity-check (LDPC) code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages. The straightforward idea is that dispersing page-size data across several pages in different types can achieve a lower average RBER and reduce the read latency. However, directly implementing this simple idea into flash translation layer (FTL) induces the read amplification issue as one logic page residing in more than one physical page brings several read operations. In this paper, we propose the Dynamic Request Interleaving (DIR) technology for improving the performance of TLC NAND flash-based SSDs, in particular, the aged ones with large RBERs. DIR exploits the observation that the latency of an I/O request is determined, without considering the queuing time, by the access of the slowest device page, i.e., the page that has the highest RBER. By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs, DIR effectively reduces the number of read retries for LDPC with limited read amplification. To meet the requirement of allocating hybrid page types for interleaved data, we also design a page-interleaving friendly page allocation scheme, which splits all the planes into multi-plane regions for storing the interleaved data and single-plane regions for storing the normal data. The pages in the multi-plane region can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue. Based on the DIR scheme and the proposed page allocation scheme, we build DIR-enable FTL, which integrates the proposed schemes into the FTL with some modifications. Our experimental results show that adopting DIR in aged SSDs exploits nearly 33% locality from I/O requests and, on average, reduces 43% read latency over conventional aged SSDs.
{"title":"DIR: Dynamic Request Interleaving for Improving the Read Performance of Aged Solid-State Drives","authors":"Shi-Qiang Nie, Chi Zhang, Wei-Guo Wu","doi":"10.1007/s11390-023-1601-y","DOIUrl":"https://doi.org/10.1007/s11390-023-1601-y","url":null,"abstract":"<p>Triple-level cell (TLC) NAND flash is increasingly adopted to build solid-state drives (SSDs) for modern computer systems. While TLC NAND flash effectively improves storage density, it faces severe reliability issues; in particular, the pages exhibit different raw bit error rates (RBERs). Integrating strong low-density parity-check (LDPC) code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages. The straightforward idea is that dispersing page-size data across several pages in different types can achieve a lower average RBER and reduce the read latency. However, directly implementing this simple idea into flash translation layer (FTL) induces the read amplification issue as one logic page residing in more than one physical page brings several read operations. In this paper, we propose the Dynamic Request Interleaving (DIR) technology for improving the performance of TLC NAND flash-based SSDs, in particular, the aged ones with large RBERs. DIR exploits the observation that the latency of an I/O request is determined, without considering the queuing time, by the access of the slowest device page, i.e., the page that has the highest RBER. By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs, DIR effectively reduces the number of read retries for LDPC with limited read amplification. To meet the requirement of allocating hybrid page types for interleaved data, we also design a page-interleaving friendly page allocation scheme, which splits all the planes into multi-plane regions for storing the interleaved data and single-plane regions for storing the normal data. The pages in the multi-plane region can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue. Based on the DIR scheme and the proposed page allocation scheme, we build DIR-enable FTL, which integrates the proposed schemes into the FTL with some modifications. Our experimental results show that adopting DIR in aged SSDs exploits nearly 33% locality from I/O requests and, on average, reduces 43% read latency over conventional aged SSDs.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence (AGI), and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips, basic software and hardware, and algorithms/applications that embody this technology. While the system is developing rapidly, it faces various challenges and opportunities brought by interdisciplinary research, including the issue of software and hardware fragmentation. This paper analyzes the status quo of brain-inspired computing systems. Enlightened by some design principle and methodology of general-purpose computers, it is proposed to construct “general-purpose” brain-inspired computing systems. A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware, which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architectures. Further, this paper introduces our recent work in these aspects, including the ANN (artificial neural network)/SNN (spiking neural network) development tools, the hardware agnostic compilation infrastructure, and the chip micro-architecture with high flexibility of programming and high performance; these studies show that the “general-purpose” system can remarkably improve the efficiency of application development and enhance the productivity of basic software, thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications. We believe that this is the key to the collaborative research and development, and the evolution of applications, basic software and chips in this field, and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.
{"title":"Research on General-Purpose Brain-Inspired Computing Systems","authors":"Peng Qu, Xing-Long Ji, Jia-Jie Chen, Meng Pang, Yu-Chen Li, Xiao-Yi Liu, You-Hui Zhang","doi":"10.1007/s11390-023-4002-3","DOIUrl":"https://doi.org/10.1007/s11390-023-4002-3","url":null,"abstract":"<p>Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence (AGI), and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips, basic software and hardware, and algorithms/applications that embody this technology. While the system is developing rapidly, it faces various challenges and opportunities brought by interdisciplinary research, including the issue of software and hardware fragmentation. This paper analyzes the status quo of brain-inspired computing systems. Enlightened by some design principle and methodology of general-purpose computers, it is proposed to construct “general-purpose” brain-inspired computing systems. A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware, which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architectures. Further, this paper introduces our recent work in these aspects, including the ANN (artificial neural network)/SNN (spiking neural network) development tools, the hardware agnostic compilation infrastructure, and the chip micro-architecture with high flexibility of programming and high performance; these studies show that the “general-purpose” system can remarkably improve the efficiency of application development and enhance the productivity of basic software, thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications. We believe that this is the key to the collaborative research and development, and the evolution of applications, basic software and chips in this field, and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1007/s11390-022-1887-1
Abstract
Synthesizing garment dynamics according to body motions is a vital technique in computer graphics. Physics-based simulation depends on an accurate model of the law of kinetics of cloth, which is time-consuming, hard to implement, and complex to control. Existing data-driven approaches either lack temporal consistency, or fail to handle garments that are different from body topology. In this paper, we present a motion-inspired real-time garment synthesis workflow that enables high-level control of garment shape. Given a sequence of body motions, our workflow is able to generate corresponding garment dynamics with both spatial and temporal coherence. To that end, we develop a transformerbased garment synthesis network to learn the mapping from body motions to garment dynamics. Frame-level attention is employed to capture the dependency of garments and body motions. Moreover, a post-processing procedure is further taken to perform penetration removal and auto-texturing. Then, textured clothing animation that is collision-free and temporally-consistent is generated. We quantitatively and qualitatively evaluated our proposed workflow from different aspects. Extensive experiments demonstrate that our network is able to deliver clothing dynamics which retain the wrinkles from the physics-based simulation, while running 1 000 times faster. Besides, our workflow achieved superior synthesis performance compared with alternative approaches. To stimulate further research in this direction, our code will be publicly available soon.
{"title":"Motion-Inspired Real-Time Garment Synthesis with Temporal-Consistency","authors":"","doi":"10.1007/s11390-022-1887-1","DOIUrl":"https://doi.org/10.1007/s11390-022-1887-1","url":null,"abstract":"<h3>Abstract</h3> <p>Synthesizing garment dynamics according to body motions is a vital technique in computer graphics. Physics-based simulation depends on an accurate model of the law of kinetics of cloth, which is time-consuming, hard to implement, and complex to control. Existing data-driven approaches either lack temporal consistency, or fail to handle garments that are different from body topology. In this paper, we present a motion-inspired real-time garment synthesis workflow that enables high-level control of garment shape. Given a sequence of body motions, our workflow is able to generate corresponding garment dynamics with both spatial and temporal coherence. To that end, we develop a transformerbased garment synthesis network to learn the mapping from body motions to garment dynamics. Frame-level attention is employed to capture the dependency of garments and body motions. Moreover, a post-processing procedure is further taken to perform penetration removal and auto-texturing. Then, textured clothing animation that is collision-free and temporally-consistent is generated. We quantitatively and qualitatively evaluated our proposed workflow from different aspects. Extensive experiments demonstrate that our network is able to deliver clothing dynamics which retain the wrinkles from the physics-based simulation, while running 1 000 times faster. Besides, our workflow achieved superior synthesis performance compared with alternative approaches. To stimulate further research in this direction, our code will be publicly available soon.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1007/s11390-022-1919-x
Abstract
Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10 300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1 029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).
{"title":"Automatic Target Description File Generation","authors":"","doi":"10.1007/s11390-022-1919-x","DOIUrl":"https://doi.org/10.1007/s11390-022-1919-x","url":null,"abstract":"<h3>Abstract</h3> <p>Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10 300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1 029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1007/s11390-023-1346-7
Abstract
Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.
摘要 为频繁项集挖掘(FIM)收集敏感信息的局部差分隐私(LDP)方法可以可靠地保证隐私。由于每笔用户交易都代表一组项目,因此目前大多数 LDP 下的频繁项集挖掘方法都增加了 "填充和采样 "步骤,以获得频繁项集及其频率。目前最先进的方法,即集值项集挖掘(SVSM),必须在方差和偏差之间取得平衡,才能获得准确的结果。因此,一种无偏见、方差较小的 FIM 方法大有可为。为了缩小这一差距,我们提出了一种项级 LDP 频率甲骨文方法,名为基于哈达玛德变换的集成频率甲骨文(IHFO)。我们首次在一组值中引入哈达玛编码,将所有项目编码为一个固定的向量,随后可对该向量进行扰动。我们提出了一种称为优化联合项集挖掘(O-UISM)的 FIM 方法,将基于填充和采样的频率神谕(PSFO)和 IHFO 结合到一个框架中,以获取精确的频繁项集及其频率。最后,我们通过理论和实验证明,O-UISM 在寻找频繁项集和估算其频率方面明显优于现有方法。
{"title":"Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy","authors":"","doi":"10.1007/s11390-023-1346-7","DOIUrl":"https://doi.org/10.1007/s11390-023-1346-7","url":null,"abstract":"<h3>Abstract</h3> <p>Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139659401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1007/s11390-023-1420-1
Wen-Yu Gao, Hang Gao
Cluster deletion and strong triadic closure are two important NP-complete problems that have received significant attention due to their applications in various areas, including social networks and data analysis. Although cluster deletion and strong triadic closure are closely linked by induced paths on three vertices, there are subtle differences between them. In some cases, the solutions of strong triadic closure and cluster deletion are quite different. In this paper, we study the parameterized algorithms for these two problems. More specifically, we focus on the kernels of these two problems. Instead of separating the critical clique and its neighbors for analysis, we consider them as a whole, which allows us to more effectively bound the number of related vertices. In addition, in analyzing the kernel of strong triadic closure, we introduce the concept of edge-disjoint induced path on three vertices, which enables us to obtain the lower bound of weak edge number in a more concise way. Our analysis demonstrates that cluster deletion and strong triadic closure both admit 2k-vertex kernels. These results represent improvements over previously best-known kernels for both problems. Furthermore, our analysis provides additional insights into the relationship between cluster deletion and strong triadic closure.
{"title":"2k-Vertex Kernels for Cluster Deletion and Strong Triadic Closure","authors":"Wen-Yu Gao, Hang Gao","doi":"10.1007/s11390-023-1420-1","DOIUrl":"https://doi.org/10.1007/s11390-023-1420-1","url":null,"abstract":"<p>Cluster deletion and strong triadic closure are two important NP-complete problems that have received significant attention due to their applications in various areas, including social networks and data analysis. Although cluster deletion and strong triadic closure are closely linked by induced paths on three vertices, there are subtle differences between them. In some cases, the solutions of strong triadic closure and cluster deletion are quite different. In this paper, we study the parameterized algorithms for these two problems. More specifically, we focus on the kernels of these two problems. Instead of separating the critical clique and its neighbors for analysis, we consider them as a whole, which allows us to more effectively bound the number of related vertices. In addition, in analyzing the kernel of strong triadic closure, we introduce the concept of edge-disjoint induced path on three vertices, which enables us to obtain the lower bound of weak edge number in a more concise way. Our analysis demonstrates that cluster deletion and strong triadic closure both admit 2<i>k</i>-vertex kernels. These results represent improvements over previously best-known kernels for both problems. Furthermore, our analysis provides additional insights into the relationship between cluster deletion and strong triadic closure.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-30DOI: 10.1007/s11390-023-1295-1
Ming He, Yan Chen, Hong-Ke Zhao, Qi Liu, Le Wu, Yu Cui, Gui-Hua Zeng, Gui-Quan Liu
Automatic generation of Chinese classical poetry is still a challenging problem in artificial intelligence. Recently, Encoder-Decoder models have provided a few viable methods for poetry generation. However, by reviewing the prior methods, two major issues still need to be settled: 1) most of them are one-stage generation methods without further polishing; 2) they rarely take into consideration the restrictions of poetry, such as tone and rhyme. Intuitively, some ancient Chinese poets tended first to write a coarse poem underlying aesthetics and then deliberated its semantics; while others first create a semantic poem and then refine its aesthetics. On this basis, in order to better imitate the human creation procedure of poems, we propose a two-stage method (i.e., restricted polishing generation method) of which each stage focuses on the different aspects of poems (i.e., semantics and aesthetics), which can produce a higher quality of generated poems. In this way, the two-stage method develops into two symmetrical generation methods, the aesthetics-to-semantics method and the semantics-to-aesthetics method. In particular, we design a sampling method and a gate to formulate the tone and rhyme restrictions, which can further improve the rhythm of the generated poems. Experimental results demonstrate the superiority of our proposed two-stage method in both automatic evaluation metrics and human evaluation metrics compared with baselines, especially in yielding consistent improvements in tone and rhyme.
{"title":"Composing Like an Ancient Chinese Poet: Learn to Generate Rhythmic Chinese Poetry","authors":"Ming He, Yan Chen, Hong-Ke Zhao, Qi Liu, Le Wu, Yu Cui, Gui-Hua Zeng, Gui-Quan Liu","doi":"10.1007/s11390-023-1295-1","DOIUrl":"https://doi.org/10.1007/s11390-023-1295-1","url":null,"abstract":"<p>Automatic generation of Chinese classical poetry is still a challenging problem in artificial intelligence. Recently, Encoder-Decoder models have provided a few viable methods for poetry generation. However, by reviewing the prior methods, two major issues still need to be settled: 1) most of them are one-stage generation methods without further polishing; 2) they rarely take into consideration the restrictions of poetry, such as tone and rhyme. Intuitively, some ancient Chinese poets tended first to write a coarse poem underlying aesthetics and then deliberated its semantics; while others first create a semantic poem and then refine its aesthetics. On this basis, in order to better imitate the human creation procedure of poems, we propose a two-stage method (i.e., restricted polishing generation method) of which each stage focuses on the different aspects of poems (i.e., semantics and aesthetics), which can produce a higher quality of generated poems. In this way, the two-stage method develops into two symmetrical generation methods, the aesthetics-to-semantics method and the semantics-to-aesthetics method. In particular, we design a sampling method and a gate to formulate the tone and rhyme restrictions, which can further improve the rhythm of the generated poems. Experimental results demonstrate the superiority of our proposed two-stage method in both automatic evaluation metrics and human evaluation metrics compared with baselines, especially in yielding consistent improvements in tone and rhyme.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}