J. Inf. Process.最新文献

英文中文

Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku 超级计算机Fugaku上并行排序的性能评价

J. Inf. Process.

Pub Date : 2023-05-09 DOI: 10.48550/arXiv.2305.05245

Tomoyuki Tokuue, T. Ishiyama

Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.

排序是最基本的算法之一，开发高度并行的排序程序在高性能计算中变得越来越重要，因为现代超级计算机中每个节点的CPU内核数量趋于增加。在本研究中，我们实现了两种基于samplesort的多线程排序算法，并比较了它们在超级计算机Fugaku上的性能。第一种算法将输入序列划分为多个块，对每个块进行排序，然后通过定期从每个块中采样来选择枢轴。然后使用枢轴对每个块进行分区，并将不同块中的分区合并为单个排序序列。第二种算法与第一种算法的不同之处在于，它只选择枢轴，其中使用二分搜索来选择枢轴，使每个分区中的元素数量相等。我们比较了两种算法在不同顺序排序和多路合并算法下的性能。我们证明了第二种算法使用BlockQuicksort(通过减少条件分支加速的快速排序)进行顺序排序和选择树进行合并，对于各种输入数据类型和数据大小都显示出一致的高速和高并行效率。

{"title":"Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku","authors":"Tomoyuki Tokuue, T. Ishiyama","doi":"10.48550/arXiv.2305.05245","DOIUrl":"https://doi.org/10.48550/arXiv.2305.05245","url":null,"abstract":"Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115622974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Type checking data structures more complex than trees 类型检查数据结构比树更复杂

J. Inf. Process.

Pub Date : 2022-09-12 DOI: 10.48550/arXiv.2209.05149

J. Sano, Naoki Yamamoto, K. Ueda

Graphs are a generalized concept that encompasses more complex data structures than trees, such as difference lists, doubly-linked lists, skip lists, and leaf-linked trees. Normally, these structures are handled with destructive assignments to heaps, which is opposed to a purely functional programming style and makes verification difficult. We propose a new purely functional language, $lambda_{GT}$, that handles graphs as immutable, first-class data structures with a pattern matching mechanism based on Graph Transformation and developed a new type system, $F_{GT}$, for the language. Our approach is in contrast with the analysis of pointer manipulation programs using separation logic, shape analysis, etc. in that (i) we do not consider destructive operations but pattern matchings over graphs provided by the new higher-level language that abstract pointers and heaps away and that (ii) we pursue what properties can be established automatically using a rather simple typing framework.

图是一个广义的概念，它包含比树更复杂的数据结构，如差表、双链表、跳跃表和叶链树。通常，这些结构是通过对堆的破坏性赋值来处理的，这与纯函数式编程风格相反，并且使验证变得困难。我们提出了一种新的纯函数式语言$lambda_{GT}$，它使用基于图转换的模式匹配机制将图作为不可变的一级数据结构来处理，并为该语言开发了一个新的类型系统$F_{GT}$。我们的方法与使用分离逻辑、形状分析等来分析指针操作程序的方法相反，因为(i)我们不考虑破坏性操作，而是考虑由抽象指针和堆的新高级语言提供的图形的模式匹配，(ii)我们追求可以使用相当简单的类型框架自动建立的属性。

引用次数: 0

Nearest Neighbor Non-autoregressive Text Generation 最近邻非自回归文本生成

J. Inf. Process.

Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12496

Ayana Niwa, Sho Takase, Naoaki Okazaki

Non-autoregressive (NAR) models can generate sentences with less computation than autoregressive models but sacrifice generation quality. Previous studies addressed this issue through iterative decoding. This study proposes using nearest neighbors as the initial state of an NAR decoder and editing them iteratively. We present a novel training strategy to learn the edit operations on neighbors to improve NAR text generation. Experimental results show that the proposed method (NeighborEdit) achieves higher translation quality (1.69 points higher than the vanilla Transformer) with fewer decoding iterations (one-eighteenth fewer iterations) on the JRC-Acquis En-De dataset, the common benchmark dataset for machine translation using nearest neighbors. We also confirm the effectiveness of the proposed method on a data-to-text task (WikiBio). In addition, the proposed method outperforms an NAR baseline on the WMT'14 En-De dataset. We also report analysis on neighbor examples used in the proposed method.

与自回归模型相比，非自回归模型能以更少的计算量生成句子，但会牺牲生成质量。以前的研究通过迭代解码解决了这个问题。本研究提出使用最近邻作为NAR解码器的初始状态，并对其进行迭代编辑。我们提出了一种新的训练策略来学习邻居的编辑操作，以提高NAR文本的生成。实验结果表明，本文提出的方法(neighborredit)在JRC-Acquis En-De数据集(使用最近邻进行机器翻译的常用基准数据集)上以更少的解码迭代(迭代次数减少1 / 18)获得了更高的翻译质量(比vanilla Transformer高1.69分)。我们还证实了所提出的方法在数据到文本任务(WikiBio)上的有效性。此外，该方法在WMT'14 En-De数据集上优于NAR基线。我们还报道了对所提出方法中使用的邻例的分析。

引用次数: 5

Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation 用主题条件数据增强和逻辑形式生成改进逻辑级自然语言生成

J. Inf. Process.

Pub Date : 2021-12-12 DOI: 10.2197/ipsjjip.31.332

Ao Liu, Congjian Luo, Naoaki Okazaki

Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.

逻辑自然语言生成，即生成结构化表在逻辑上需要的文本描述，由于生成的保真度较低，一直是一个挑战。citet{chen2020logic2text}通过注释临时逻辑程序来控制生成内容和语义来解决这个问题，并提出了表感知逻辑形式生成文本(Logic2text)的任务。然而，尽管在现实世界中有大量的表实例，但是与文本描述配对的逻辑形式需要昂贵的人工注释工作，这限制了神经模型的性能。为了缓解这一问题，我们提出了主题条件数据增强(TopicDA)，它利用GPT-2直接从表中生成不配对的逻辑形式和文本描述。我们进一步介绍逻辑表单生成(LG)，这是Logic2text的双重任务，需要根据表的文本描述生成有效的逻辑表单。我们还提出了一种半监督学习方法来联合训练具有标记和增强数据的Logic2text和LG模型。两种模型通过反向翻译提供额外的监督信号，从而相互受益。在Logic2text数据集和LG任务上的实验结果表明，我们的方法可以有效地利用增强数据，并且在很大程度上优于监督基线。

{"title":"Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation","authors":"Ao Liu, Congjian Luo, Naoaki Okazaki","doi":"10.2197/ipsjjip.31.332","DOIUrl":"https://doi.org/10.2197/ipsjjip.31.332","url":null,"abstract":"Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115138487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Work-stealing Strategies That Consider Work Amount and Hierarchy 考虑工作量和等级的偷工作策略

J. Inf. Process.

Pub Date : 2021-06-15 DOI: 10.2197/ipsjjip.29.478

Ryusuke Nakashima, M. Yasugi, Hiroshi Yoritaka, Tasuku Hiraishi, Seiji Umatani

引用次数: 1

Localization with Portable APs in Ultra-Narrow-Band-based LPWA Networks 基于超窄带LPWA网络的便携式ap定位

J. Inf. Process.

Pub Date : 2021-02-15 DOI: 10.2197/ipsjjip.29.149

Miya Fukumoto, Takuya Yoshihiro

: For IoT applications LPWA is a useful communication choice that enables us to connect tiny devices spread over the land to the Internet. Since many low-price IoT devices usually need to work with limited power budget, this kind of low-power long-range communication technique is a strong tool to populate IoT deployment. Since LPWA de- vices are less functional, localization of devices are addressed as one of the important practical problems. UNB (Ultra Narrow Band)-based LPWA networks such as Sigfox are one of the major LPWA services for IoT applications, which have a long communication range more than 10km. However, due to the long-range communications and the property of UNB-based modulation, it is not possible to use state-of-the-art localization techniques with high-accuracy; UNB- based LPWA should use simple methods based on RSSI (Radio Signal Strength Indicator) that involves large position estimation errors. In this paper, we propose a method to improve accuracy of device localization in UNB-based LPWA networks by utilizing portable Access Points (APs). By introducing a distance-based weighting technique, we improve the localization accuracy in combination with stationary and portable APs. We demonstrated that the portable AP and the new weighting technique e ﬀ ectively works in UNB-based LPWA networks.

对于物联网应用，LPWA是一种有用的通信选择，使我们能够将遍布陆地的微型设备连接到互联网。由于许多低价物联网设备通常需要在有限的功率预算下工作，因此这种低功耗远程通信技术是普及物联网部署的强大工具。由于低功耗广域网设备的功能较差，因此设备的本地化成为一个重要的实际问题。Sigfox等基于UNB(超窄带)的LPWA网络是物联网应用的主要LPWA服务之一，其通信距离超过10公里。然而，由于远程通信和基于unb调制的特性，不可能使用最先进的高精度定位技术;基于UNB的LPWA应采用基于RSSI (Radio Signal Strength Indicator)的简单方法，而RSSI (Radio Signal Strength Indicator)的位置估计误差较大。在本文中，我们提出了一种利用便携式接入点(ap)来提高基于unb的LPWA网络中设备定位精度的方法。通过引入基于距离的加权技术，我们结合固定式和便携式ap提高了定位精度。我们证明了便携式AP和新的加权技术可以有效地工作在基于unb的LPWA网络中。

{"title":"Localization with Portable APs in Ultra-Narrow-Band-based LPWA Networks","authors":"Miya Fukumoto, Takuya Yoshihiro","doi":"10.2197/ipsjjip.29.149","DOIUrl":"https://doi.org/10.2197/ipsjjip.29.149","url":null,"abstract":": For IoT applications LPWA is a useful communication choice that enables us to connect tiny devices spread over the land to the Internet. Since many low-price IoT devices usually need to work with limited power budget, this kind of low-power long-range communication technique is a strong tool to populate IoT deployment. Since LPWA de- vices are less functional, localization of devices are addressed as one of the important practical problems. UNB (Ultra Narrow Band)-based LPWA networks such as Sigfox are one of the major LPWA services for IoT applications, which have a long communication range more than 10km. However, due to the long-range communications and the property of UNB-based modulation, it is not possible to use state-of-the-art localization techniques with high-accuracy; UNB- based LPWA should use simple methods based on RSSI (Radio Signal Strength Indicator) that involves large position estimation errors. In this paper, we propose a method to improve accuracy of device localization in UNB-based LPWA networks by utilizing portable Access Points (APs). By introducing a distance-based weighting technique, we improve the localization accuracy in combination with stationary and portable APs. We demonstrated that the portable AP and the new weighting technique e ﬀ ectively works in UNB-based LPWA networks.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117101998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Adaptive Traffic Signal Control Scheme Based on Back-pressure with Global Information 基于背压全局信息的自适应交通信号控制方案

J. Inf. Process.

Pub Date : 2021-02-15 DOI: 10.2197/ipsjjip.29.124

Arnan Maipradit, Tomoya Kawakami, Ying Liu, Juntao Gao, Minuro Ito

: Nowadays tra ﬃ c congestion has increasingly been a signiﬁcant problem, which results in a longer travel time and aggravates air pollution. Available works showed that back-pressure based tra ﬃ c control algorithms can ef-fectively reduce tra ﬃ c congestion. However, those works control tra ﬃ c based on either inaccurate tra ﬃ c information or local tra ﬃ c information, which causes ine ﬃ cient tra ﬃ c scheduling. In this paper, we propose an adaptive tra ﬃ c control algorithm based on back-pressure and Q-learning, which can e ﬃ ciently reduce congestion. Our algorithm controls tra ﬃ c based on accurate real-time tra ﬃ c information and global tra ﬃ c information learned by Q-learning. As veriﬁed by simulation, our algorithm signiﬁcantly decreases average vehicle traveling time from 17% to 38% when compared with a state-of-the-art algorithm under tested scenarios.

当前交通拥堵已成为一个日益严重的问题，它导致人们旅行时间延长，并加剧了空气污染。已有的研究表明，基于背压的流量控制算法可以有效地减少流量拥塞。然而，这些工程要么是基于不准确的流量信息，要么是基于本地的流量信息来控制流量，导致流量调度效率低下。本文提出了一种基于背压和q学习的自适应交通控制算法，可以有效地减少拥堵。我们的算法基于精确的实时交通信息和Q-learning学习到的全局交通信息来控制交通流量。通过仿真验证，与测试场景下最先进的算法相比，我们的算法将车辆平均行驶时间从17%显著减少到38%。

引用次数: 1

Split-Paper Testing: A Novel Approach to Evaluate Programming Performance 拆纸测试:一种评估编程性能的新方法

J. Inf. Process.

Pub Date : 2020-10-29 DOI: 10.2197/ipsjjip.28.733

Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda

: There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ﬃ cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ﬄ es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ﬃ nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically signiﬁcant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.

评估和/或测试编程性能是非常必要的。为此，使用了两种方案。建构反应(CR)测试让考生在一张空白纸上(或用电脑键盘)编写程序。该方案可以对编程性能进行评价。然而，由于需要熟练的人工评分，因此很难大量应用(尝试自动评估，但尚未广泛使用)。多项选择题(MC)测试让考生从一个列表(通常对应于完整程序的“隐藏”部分)中选择正确答案。该方案可以在基于计算机的测试或标记感知卡中大量使用。然而，许多教师和研究人员怀疑，一个好的分数并不一定意味着能够从头开始编写程序。我们提出了第三种方法，分纸(SP)测试。我们的方案将一个正确的程序分成它的每一行，将这些行删除，添加“错误答案”行，并在它们前面加上选择符号。考生使用与正确程序相对应的选择符号列表来回答问题，这些选择符号可以很容易地通过计算机自动评分。特别地，我们提出在评分方案中使用编辑距离(Levenshtein距离)，这似乎与SP方案有一定的联系。研究的问题是，SP测试是否使用基于编辑距离的评分方案来衡量编程性能，就像CR测试一样。因此，我们利用60名学生的大学编程课进行了一项实验，比较SP测试和CR测试。因此，SP和CR测试分数在多个设置下是相关的，结果具有统计学意义。因此，我们可以得出结论，使用编辑距离自动评分的SP测试是评估编程性能的有用工具。

{"title":"Split-Paper Testing: A Novel Approach to Evaluate Programming Performance","authors":"Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda","doi":"10.2197/ipsjjip.28.733","DOIUrl":"https://doi.org/10.2197/ipsjjip.28.733","url":null,"abstract":": There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ﬃ cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ﬄ es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ﬃ nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically signiﬁcant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129842017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

CENTAURUS: A Dynamic Parser Generator for Parallel Ad Hoc Data Extraction 半人马座:一个动态解析器生成器，用于并行自组织数据提取

J. Inf. Process.

Pub Date : 2020-10-23 DOI: 10.2197/ipsjjip.28.724

Shigeyuki Sato, Hiroka Ihara, K. Taura

: It is important to handle large-scale data in text formats such as XML, JSON, and CSV because these data very often appear in data exchange. For these data, instead of data ingestion to databases, ad hoc data extraction is highly desirable. The main issue of ad hoc data extraction is to serve both the programmability to allow handling various types of data intuitively and the performance for large-scale data. To pursue it, we develop C entaurus , a dynamic parser generator library for parallel ad hoc data extraction. This paper presents the design and implementation of C entaurus . The experimental results on ad hoc data extraction have demonstrated that C entaurus outperformed fast dedicated parser libraries in C ++ for XML and JSON, and achieved excellent scalability with actions implemented in Python.

:处理XML、JSON和CSV等文本格式的大规模数据非常重要，因为这些数据经常出现在数据交换中。对于这些数据，特别的数据提取是非常可取的，而不是将数据摄取到数据库中。临时数据提取的主要问题是既要满足可编程性，以便直观地处理各种类型的数据，又要满足大规模数据的性能。为了实现这一目标，我们开发了C entaurus，这是一个用于并行临时数据提取的动态解析器生成器库。本文介绍了C - entaurus的设计与实现。在临时数据提取方面的实验结果表明，C entaurus在处理XML和JSON方面的性能优于c++中的快速专用解析器库，并且通过Python实现的操作实现了出色的可扩展性。

引用次数: 0

IoT Area Network Simulator For Network Dataset Generation 用于网络数据集生成的物联网局域网模拟器

J. Inf. Process.

Pub Date : 2020-10-15 DOI: 10.2197/ipsjjip.28.668

Van Cu Pham, Yoshiki Makino, Khoa Pho, Yuto Lim, Yasuo Tan

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

J. Inf. Process.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀