Keira Zhou, J. J. Fox, Ke Wang, Donald E. Brown, K. Skadron
{"title":"微米自动机处理器上的Brill标签","authors":"Keira Zhou, J. J. Fox, Ke Wang, Donald E. Brown, K. Skadron","doi":"10.1109/ICOSC.2015.7050812","DOIUrl":null,"url":null,"abstract":"Semantic analysis often uses a pipeline of Natural Language Processing (NLP) tools such as part-of-speech (POS) tagging. Brill tagging is a classic rule-based algorithm for POS tagging within NLP. However, implementation of the tagger is inherently slow on conventional Von Neumann architectures. In this paper, we accelerate the second stage of Brill tagging on the Micron Automata Processor, a new computing architecture that can perform massive pattern matching in parallel. The designed structure is tested with a subset of the Brown Corpus using 218 contextual rules. The results show a 38X speed-up for the second stage tagger implemented on a single AP chip, compared to a single thread implementation on CPU. This speed-up is linear with the number of rules, thus making large and/or complex rule sets computationally practical. This paper introduces the use of this new accelerator for computational linguistic tasks, particularly those that involve rule-based or pattern-matching approaches.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":"{\"title\":\"Brill tagging on the Micron Automata Processor\",\"authors\":\"Keira Zhou, J. J. Fox, Ke Wang, Donald E. Brown, K. Skadron\",\"doi\":\"10.1109/ICOSC.2015.7050812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic analysis often uses a pipeline of Natural Language Processing (NLP) tools such as part-of-speech (POS) tagging. Brill tagging is a classic rule-based algorithm for POS tagging within NLP. However, implementation of the tagger is inherently slow on conventional Von Neumann architectures. In this paper, we accelerate the second stage of Brill tagging on the Micron Automata Processor, a new computing architecture that can perform massive pattern matching in parallel. The designed structure is tested with a subset of the Brown Corpus using 218 contextual rules. The results show a 38X speed-up for the second stage tagger implemented on a single AP chip, compared to a single thread implementation on CPU. This speed-up is linear with the number of rules, thus making large and/or complex rule sets computationally practical. This paper introduces the use of this new accelerator for computational linguistic tasks, particularly those that involve rule-based or pattern-matching approaches.\",\"PeriodicalId\":126701,\"journal\":{\"name\":\"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"49\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOSC.2015.7050812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2015.7050812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic analysis often uses a pipeline of Natural Language Processing (NLP) tools such as part-of-speech (POS) tagging. Brill tagging is a classic rule-based algorithm for POS tagging within NLP. However, implementation of the tagger is inherently slow on conventional Von Neumann architectures. In this paper, we accelerate the second stage of Brill tagging on the Micron Automata Processor, a new computing architecture that can perform massive pattern matching in parallel. The designed structure is tested with a subset of the Brown Corpus using 218 contextual rules. The results show a 38X speed-up for the second stage tagger implemented on a single AP chip, compared to a single thread implementation on CPU. This speed-up is linear with the number of rules, thus making large and/or complex rule sets computationally practical. This paper introduces the use of this new accelerator for computational linguistic tasks, particularly those that involve rule-based or pattern-matching approaches.