首页 > 最新文献

J. Inf. Process.最新文献

英文 中文
Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku 超级计算机Fugaku上并行排序的性能评价
Pub Date : 2023-05-09 DOI: 10.48550/arXiv.2305.05245
Tomoyuki Tokuue, T. Ishiyama
Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.
排序是最基本的算法之一,开发高度并行的排序程序在高性能计算中变得越来越重要,因为现代超级计算机中每个节点的CPU内核数量趋于增加。在本研究中,我们实现了两种基于samplesort的多线程排序算法,并比较了它们在超级计算机Fugaku上的性能。第一种算法将输入序列划分为多个块,对每个块进行排序,然后通过定期从每个块中采样来选择枢轴。然后使用枢轴对每个块进行分区,并将不同块中的分区合并为单个排序序列。第二种算法与第一种算法的不同之处在于,它只选择枢轴,其中使用二分搜索来选择枢轴,使每个分区中的元素数量相等。我们比较了两种算法在不同顺序排序和多路合并算法下的性能。我们证明了第二种算法使用BlockQuicksort(通过减少条件分支加速的快速排序)进行顺序排序和选择树进行合并,对于各种输入数据类型和数据大小都显示出一致的高速和高并行效率。
{"title":"Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku","authors":"Tomoyuki Tokuue, T. Ishiyama","doi":"10.48550/arXiv.2305.05245","DOIUrl":"https://doi.org/10.48550/arXiv.2305.05245","url":null,"abstract":"Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115622974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Type checking data structures more complex than trees 类型检查数据结构比树更复杂
Pub Date : 2022-09-12 DOI: 10.48550/arXiv.2209.05149
J. Sano, Naoki Yamamoto, K. Ueda
Graphs are a generalized concept that encompasses more complex data structures than trees, such as difference lists, doubly-linked lists, skip lists, and leaf-linked trees. Normally, these structures are handled with destructive assignments to heaps, which is opposed to a purely functional programming style and makes verification difficult. We propose a new purely functional language, $lambda_{GT}$, that handles graphs as immutable, first-class data structures with a pattern matching mechanism based on Graph Transformation and developed a new type system, $F_{GT}$, for the language. Our approach is in contrast with the analysis of pointer manipulation programs using separation logic, shape analysis, etc. in that (i) we do not consider destructive operations but pattern matchings over graphs provided by the new higher-level language that abstract pointers and heaps away and that (ii) we pursue what properties can be established automatically using a rather simple typing framework.
图是一个广义的概念,它包含比树更复杂的数据结构,如差表、双链表、跳跃表和叶链树。通常,这些结构是通过对堆的破坏性赋值来处理的,这与纯函数式编程风格相反,并且使验证变得困难。我们提出了一种新的纯函数式语言$lambda_{GT}$,它使用基于图转换的模式匹配机制将图作为不可变的一级数据结构来处理,并为该语言开发了一个新的类型系统$F_{GT}$。我们的方法与使用分离逻辑、形状分析等来分析指针操作程序的方法相反,因为(i)我们不考虑破坏性操作,而是考虑由抽象指针和堆的新高级语言提供的图形的模式匹配,(ii)我们追求可以使用相当简单的类型框架自动建立的属性。
{"title":"Type checking data structures more complex than trees","authors":"J. Sano, Naoki Yamamoto, K. Ueda","doi":"10.48550/arXiv.2209.05149","DOIUrl":"https://doi.org/10.48550/arXiv.2209.05149","url":null,"abstract":"Graphs are a generalized concept that encompasses more complex data structures than trees, such as difference lists, doubly-linked lists, skip lists, and leaf-linked trees. Normally, these structures are handled with destructive assignments to heaps, which is opposed to a purely functional programming style and makes verification difficult. We propose a new purely functional language, $lambda_{GT}$, that handles graphs as immutable, first-class data structures with a pattern matching mechanism based on Graph Transformation and developed a new type system, $F_{GT}$, for the language. Our approach is in contrast with the analysis of pointer manipulation programs using separation logic, shape analysis, etc. in that (i) we do not consider destructive operations but pattern matchings over graphs provided by the new higher-level language that abstract pointers and heaps away and that (ii) we pursue what properties can be established automatically using a rather simple typing framework.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124625875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nearest Neighbor Non-autoregressive Text Generation 最近邻非自回归文本生成
Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12496
Ayana Niwa, Sho Takase, Naoaki Okazaki
Non-autoregressive (NAR) models can generate sentences with less computation than autoregressive models but sacrifice generation quality. Previous studies addressed this issue through iterative decoding. This study proposes using nearest neighbors as the initial state of an NAR decoder and editing them iteratively. We present a novel training strategy to learn the edit operations on neighbors to improve NAR text generation. Experimental results show that the proposed method (NeighborEdit) achieves higher translation quality (1.69 points higher than the vanilla Transformer) with fewer decoding iterations (one-eighteenth fewer iterations) on the JRC-Acquis En-De dataset, the common benchmark dataset for machine translation using nearest neighbors. We also confirm the effectiveness of the proposed method on a data-to-text task (WikiBio). In addition, the proposed method outperforms an NAR baseline on the WMT'14 En-De dataset. We also report analysis on neighbor examples used in the proposed method.
与自回归模型相比,非自回归模型能以更少的计算量生成句子,但会牺牲生成质量。以前的研究通过迭代解码解决了这个问题。本研究提出使用最近邻作为NAR解码器的初始状态,并对其进行迭代编辑。我们提出了一种新的训练策略来学习邻居的编辑操作,以提高NAR文本的生成。实验结果表明,本文提出的方法(neighborredit)在JRC-Acquis En-De数据集(使用最近邻进行机器翻译的常用基准数据集)上以更少的解码迭代(迭代次数减少1 / 18)获得了更高的翻译质量(比vanilla Transformer高1.69分)。我们还证实了所提出的方法在数据到文本任务(WikiBio)上的有效性。此外,该方法在WMT'14 En-De数据集上优于NAR基线。我们还报道了对所提出方法中使用的邻例的分析。
{"title":"Nearest Neighbor Non-autoregressive Text Generation","authors":"Ayana Niwa, Sho Takase, Naoaki Okazaki","doi":"10.48550/arXiv.2208.12496","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12496","url":null,"abstract":"Non-autoregressive (NAR) models can generate sentences with less computation than autoregressive models but sacrifice generation quality. Previous studies addressed this issue through iterative decoding. This study proposes using nearest neighbors as the initial state of an NAR decoder and editing them iteratively. We present a novel training strategy to learn the edit operations on neighbors to improve NAR text generation. Experimental results show that the proposed method (NeighborEdit) achieves higher translation quality (1.69 points higher than the vanilla Transformer) with fewer decoding iterations (one-eighteenth fewer iterations) on the JRC-Acquis En-De dataset, the common benchmark dataset for machine translation using nearest neighbors. We also confirm the effectiveness of the proposed method on a data-to-text task (WikiBio). In addition, the proposed method outperforms an NAR baseline on the WMT'14 En-De dataset. We also report analysis on neighbor examples used in the proposed method.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126896782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation 用主题条件数据增强和逻辑形式生成改进逻辑级自然语言生成
Pub Date : 2021-12-12 DOI: 10.2197/ipsjjip.31.332
Ao Liu, Congjian Luo, Naoaki Okazaki
Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.
逻辑自然语言生成,即生成结构化表在逻辑上需要的文本描述,由于生成的保真度较低,一直是一个挑战。citet{chen2020logic2text}通过注释临时逻辑程序来控制生成内容和语义来解决这个问题,并提出了表感知逻辑形式生成文本(Logic2text)的任务。然而,尽管在现实世界中有大量的表实例,但是与文本描述配对的逻辑形式需要昂贵的人工注释工作,这限制了神经模型的性能。为了缓解这一问题,我们提出了主题条件数据增强(TopicDA),它利用GPT-2直接从表中生成不配对的逻辑形式和文本描述。我们进一步介绍逻辑表单生成(LG),这是Logic2text的双重任务,需要根据表的文本描述生成有效的逻辑表单。我们还提出了一种半监督学习方法来联合训练具有标记和增强数据的Logic2text和LG模型。两种模型通过反向翻译提供额外的监督信号,从而相互受益。在Logic2text数据集和LG任务上的实验结果表明,我们的方法可以有效地利用增强数据,并且在很大程度上优于监督基线。
{"title":"Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation","authors":"Ao Liu, Congjian Luo, Naoaki Okazaki","doi":"10.2197/ipsjjip.31.332","DOIUrl":"https://doi.org/10.2197/ipsjjip.31.332","url":null,"abstract":"Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115138487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Work-stealing Strategies That Consider Work Amount and Hierarchy 考虑工作量和等级的偷工作策略
Pub Date : 2021-06-15 DOI: 10.2197/ipsjjip.29.478
Ryusuke Nakashima, M. Yasugi, Hiroshi Yoritaka, Tasuku Hiraishi, Seiji Umatani
{"title":"Work-stealing Strategies That Consider Work Amount and Hierarchy","authors":"Ryusuke Nakashima, M. Yasugi, Hiroshi Yoritaka, Tasuku Hiraishi, Seiji Umatani","doi":"10.2197/ipsjjip.29.478","DOIUrl":"https://doi.org/10.2197/ipsjjip.29.478","url":null,"abstract":"","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131928565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Localization with Portable APs in Ultra-Narrow-Band-based LPWA Networks 基于超窄带LPWA网络的便携式ap定位
Pub Date : 2021-02-15 DOI: 10.2197/ipsjjip.29.149
Miya Fukumoto, Takuya Yoshihiro
: For IoT applications LPWA is a useful communication choice that enables us to connect tiny devices spread over the land to the Internet. Since many low-price IoT devices usually need to work with limited power budget, this kind of low-power long-range communication technique is a strong tool to populate IoT deployment. Since LPWA de- vices are less functional, localization of devices are addressed as one of the important practical problems. UNB (Ultra Narrow Band)-based LPWA networks such as Sigfox are one of the major LPWA services for IoT applications, which have a long communication range more than 10km. However, due to the long-range communications and the property of UNB-based modulation, it is not possible to use state-of-the-art localization techniques with high-accuracy; UNB- based LPWA should use simple methods based on RSSI (Radio Signal Strength Indicator) that involves large position estimation errors. In this paper, we propose a method to improve accuracy of device localization in UNB-based LPWA networks by utilizing portable Access Points (APs). By introducing a distance-based weighting technique, we improve the localization accuracy in combination with stationary and portable APs. We demonstrated that the portable AP and the new weighting technique e ff ectively works in UNB-based LPWA networks.
对于物联网应用,LPWA是一种有用的通信选择,使我们能够将遍布陆地的微型设备连接到互联网。由于许多低价物联网设备通常需要在有限的功率预算下工作,因此这种低功耗远程通信技术是普及物联网部署的强大工具。由于低功耗广域网设备的功能较差,因此设备的本地化成为一个重要的实际问题。Sigfox等基于UNB(超窄带)的LPWA网络是物联网应用的主要LPWA服务之一,其通信距离超过10公里。然而,由于远程通信和基于unb调制的特性,不可能使用最先进的高精度定位技术;基于UNB的LPWA应采用基于RSSI (Radio Signal Strength Indicator)的简单方法,而RSSI (Radio Signal Strength Indicator)的位置估计误差较大。在本文中,我们提出了一种利用便携式接入点(ap)来提高基于unb的LPWA网络中设备定位精度的方法。通过引入基于距离的加权技术,我们结合固定式和便携式ap提高了定位精度。我们证明了便携式AP和新的加权技术可以有效地工作在基于unb的LPWA网络中。
{"title":"Localization with Portable APs in Ultra-Narrow-Band-based LPWA Networks","authors":"Miya Fukumoto, Takuya Yoshihiro","doi":"10.2197/ipsjjip.29.149","DOIUrl":"https://doi.org/10.2197/ipsjjip.29.149","url":null,"abstract":": For IoT applications LPWA is a useful communication choice that enables us to connect tiny devices spread over the land to the Internet. Since many low-price IoT devices usually need to work with limited power budget, this kind of low-power long-range communication technique is a strong tool to populate IoT deployment. Since LPWA de- vices are less functional, localization of devices are addressed as one of the important practical problems. UNB (Ultra Narrow Band)-based LPWA networks such as Sigfox are one of the major LPWA services for IoT applications, which have a long communication range more than 10km. However, due to the long-range communications and the property of UNB-based modulation, it is not possible to use state-of-the-art localization techniques with high-accuracy; UNB- based LPWA should use simple methods based on RSSI (Radio Signal Strength Indicator) that involves large position estimation errors. In this paper, we propose a method to improve accuracy of device localization in UNB-based LPWA networks by utilizing portable Access Points (APs). By introducing a distance-based weighting technique, we improve the localization accuracy in combination with stationary and portable APs. We demonstrated that the portable AP and the new weighting technique e ff ectively works in UNB-based LPWA networks.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117101998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Adaptive Traffic Signal Control Scheme Based on Back-pressure with Global Information 基于背压全局信息的自适应交通信号控制方案
Pub Date : 2021-02-15 DOI: 10.2197/ipsjjip.29.124
Arnan Maipradit, Tomoya Kawakami, Ying Liu, Juntao Gao, Minuro Ito
: Nowadays tra ffi c congestion has increasingly been a significant problem, which results in a longer travel time and aggravates air pollution. Available works showed that back-pressure based tra ffi c control algorithms can ef-fectively reduce tra ffi c congestion. However, those works control tra ffi c based on either inaccurate tra ffi c information or local tra ffi c information, which causes ine ffi cient tra ffi c scheduling. In this paper, we propose an adaptive tra ffi c control algorithm based on back-pressure and Q-learning, which can e ffi ciently reduce congestion. Our algorithm controls tra ffi c based on accurate real-time tra ffi c information and global tra ffi c information learned by Q-learning. As verified by simulation, our algorithm significantly decreases average vehicle traveling time from 17% to 38% when compared with a state-of-the-art algorithm under tested scenarios.
当前交通拥堵已成为一个日益严重的问题,它导致人们旅行时间延长,并加剧了空气污染。已有的研究表明,基于背压的流量控制算法可以有效地减少流量拥塞。然而,这些工程要么是基于不准确的流量信息,要么是基于本地的流量信息来控制流量,导致流量调度效率低下。本文提出了一种基于背压和q学习的自适应交通控制算法,可以有效地减少拥堵。我们的算法基于精确的实时交通信息和Q-learning学习到的全局交通信息来控制交通流量。通过仿真验证,与测试场景下最先进的算法相比,我们的算法将车辆平均行驶时间从17%显著减少到38%。
{"title":"An Adaptive Traffic Signal Control Scheme Based on Back-pressure with Global Information","authors":"Arnan Maipradit, Tomoya Kawakami, Ying Liu, Juntao Gao, Minuro Ito","doi":"10.2197/ipsjjip.29.124","DOIUrl":"https://doi.org/10.2197/ipsjjip.29.124","url":null,"abstract":": Nowadays tra ffi c congestion has increasingly been a significant problem, which results in a longer travel time and aggravates air pollution. Available works showed that back-pressure based tra ffi c control algorithms can ef-fectively reduce tra ffi c congestion. However, those works control tra ffi c based on either inaccurate tra ffi c information or local tra ffi c information, which causes ine ffi cient tra ffi c scheduling. In this paper, we propose an adaptive tra ffi c control algorithm based on back-pressure and Q-learning, which can e ffi ciently reduce congestion. Our algorithm controls tra ffi c based on accurate real-time tra ffi c information and global tra ffi c information learned by Q-learning. As verified by simulation, our algorithm significantly decreases average vehicle traveling time from 17% to 38% when compared with a state-of-the-art algorithm under tested scenarios.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Split-Paper Testing: A Novel Approach to Evaluate Programming Performance 拆纸测试:一种评估编程性能的新方法
Pub Date : 2020-10-29 DOI: 10.2197/ipsjjip.28.733
Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda
: There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ffi cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ffl es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ffi nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically significant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.
评估和/或测试编程性能是非常必要的。为此,使用了两种方案。建构反应(CR)测试让考生在一张空白纸上(或用电脑键盘)编写程序。该方案可以对编程性能进行评价。然而,由于需要熟练的人工评分,因此很难大量应用(尝试自动评估,但尚未广泛使用)。多项选择题(MC)测试让考生从一个列表(通常对应于完整程序的“隐藏”部分)中选择正确答案。该方案可以在基于计算机的测试或标记感知卡中大量使用。然而,许多教师和研究人员怀疑,一个好的分数并不一定意味着能够从头开始编写程序。我们提出了第三种方法,分纸(SP)测试。我们的方案将一个正确的程序分成它的每一行,将这些行删除,添加“错误答案”行,并在它们前面加上选择符号。考生使用与正确程序相对应的选择符号列表来回答问题,这些选择符号可以很容易地通过计算机自动评分。特别地,我们提出在评分方案中使用编辑距离(Levenshtein距离),这似乎与SP方案有一定的联系。研究的问题是,SP测试是否使用基于编辑距离的评分方案来衡量编程性能,就像CR测试一样。因此,我们利用60名学生的大学编程课进行了一项实验,比较SP测试和CR测试。因此,SP和CR测试分数在多个设置下是相关的,结果具有统计学意义。因此,我们可以得出结论,使用编辑距离自动评分的SP测试是评估编程性能的有用工具。
{"title":"Split-Paper Testing: A Novel Approach to Evaluate Programming Performance","authors":"Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda","doi":"10.2197/ipsjjip.28.733","DOIUrl":"https://doi.org/10.2197/ipsjjip.28.733","url":null,"abstract":": There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ffi cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ffl es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ffi nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically significant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129842017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CENTAURUS: A Dynamic Parser Generator for Parallel Ad Hoc Data Extraction 半人马座:一个动态解析器生成器,用于并行自组织数据提取
Pub Date : 2020-10-23 DOI: 10.2197/ipsjjip.28.724
Shigeyuki Sato, Hiroka Ihara, K. Taura
: It is important to handle large-scale data in text formats such as XML, JSON, and CSV because these data very often appear in data exchange. For these data, instead of data ingestion to databases, ad hoc data extraction is highly desirable. The main issue of ad hoc data extraction is to serve both the programmability to allow handling various types of data intuitively and the performance for large-scale data. To pursue it, we develop C entaurus , a dynamic parser generator library for parallel ad hoc data extraction. This paper presents the design and implementation of C entaurus . The experimental results on ad hoc data extraction have demonstrated that C entaurus outperformed fast dedicated parser libraries in C ++ for XML and JSON, and achieved excellent scalability with actions implemented in Python.
:处理XML、JSON和CSV等文本格式的大规模数据非常重要,因为这些数据经常出现在数据交换中。对于这些数据,特别的数据提取是非常可取的,而不是将数据摄取到数据库中。临时数据提取的主要问题是既要满足可编程性,以便直观地处理各种类型的数据,又要满足大规模数据的性能。为了实现这一目标,我们开发了C entaurus,这是一个用于并行临时数据提取的动态解析器生成器库。本文介绍了C - entaurus的设计与实现。在临时数据提取方面的实验结果表明,C entaurus在处理XML和JSON方面的性能优于c++中的快速专用解析器库,并且通过Python实现的操作实现了出色的可扩展性。
{"title":"CENTAURUS: A Dynamic Parser Generator for Parallel Ad Hoc Data Extraction","authors":"Shigeyuki Sato, Hiroka Ihara, K. Taura","doi":"10.2197/ipsjjip.28.724","DOIUrl":"https://doi.org/10.2197/ipsjjip.28.724","url":null,"abstract":": It is important to handle large-scale data in text formats such as XML, JSON, and CSV because these data very often appear in data exchange. For these data, instead of data ingestion to databases, ad hoc data extraction is highly desirable. The main issue of ad hoc data extraction is to serve both the programmability to allow handling various types of data intuitively and the performance for large-scale data. To pursue it, we develop C entaurus , a dynamic parser generator library for parallel ad hoc data extraction. This paper presents the design and implementation of C entaurus . The experimental results on ad hoc data extraction have demonstrated that C entaurus outperformed fast dedicated parser libraries in C ++ for XML and JSON, and achieved excellent scalability with actions implemented in Python.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116836214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IoT Area Network Simulator For Network Dataset Generation 用于网络数据集生成的物联网局域网模拟器
Pub Date : 2020-10-15 DOI: 10.2197/ipsjjip.28.668
Van Cu Pham, Yoshiki Makino, Khoa Pho, Yuto Lim, Yasuo Tan
{"title":"IoT Area Network Simulator For Network Dataset Generation","authors":"Van Cu Pham, Yoshiki Makino, Khoa Pho, Yuto Lim, Yasuo Tan","doi":"10.2197/ipsjjip.28.668","DOIUrl":"https://doi.org/10.2197/ipsjjip.28.668","url":null,"abstract":"","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132181979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
J. Inf. Process.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1