Pechlada Seenual, Thodsaporn Chay-intr, T. Theeramunkong
{"title":"CF Planter: A Toolset for Semi-automatic Thai Treebank Construction","authors":"Pechlada Seenual, Thodsaporn Chay-intr, T. Theeramunkong","doi":"10.1109/ICESIT-ICICTES.2018.8442061","DOIUrl":null,"url":null,"abstract":"To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree visualizer, tree-structure editor, and collaborative functions. In the past, existing tools did not consider an integrated platform that provides preprocessing, automated or semi-automated mechanism for parse tree suggestion, as well as tagged corpus data management. This paper presents a so-called CF Planter, a toolset for semi-automatic Thai treebank construction that consist of word segmenter, part-of-speech tagger, statistical parser, a web-based GUI for syntactic tree refinement and management. Given an input sentence, its most likely syntactic tree is automatically suggested and visualized to an annotator for manual correction before adding into the treebank repository. Whenever a new syntactic tree is appended into the treebank, the treebank repository is iteratively refined by computing a set of newly revised grammar rules based on revised probabilities. Toolset is performed to severally illustrate with grammar frequencies. The toolset facilitates annotators to easily tag tree structure for an input sentence. Finally, the process of automatic suggestion of syntactic tree is evaluated.","PeriodicalId":57136,"journal":{"name":"单片机与嵌入式系统应用","volume":"9 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"单片机与嵌入式系统应用","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/ICESIT-ICICTES.2018.8442061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
To fasten treebank construction, it is necessary to design an integrated annotation tool that includes word segmenter, sentence parser for initial tree suggestion, tree visualizer, tree-structure editor, and collaborative functions. In the past, existing tools did not consider an integrated platform that provides preprocessing, automated or semi-automated mechanism for parse tree suggestion, as well as tagged corpus data management. This paper presents a so-called CF Planter, a toolset for semi-automatic Thai treebank construction that consist of word segmenter, part-of-speech tagger, statistical parser, a web-based GUI for syntactic tree refinement and management. Given an input sentence, its most likely syntactic tree is automatically suggested and visualized to an annotator for manual correction before adding into the treebank repository. Whenever a new syntactic tree is appended into the treebank, the treebank repository is iteratively refined by computing a set of newly revised grammar rules based on revised probabilities. Toolset is performed to severally illustrate with grammar frequencies. The toolset facilitates annotators to easily tag tree structure for an input sentence. Finally, the process of automatic suggestion of syntactic tree is evaluated.