{"title":"一种利用二阶马尔可夫模型求“bunsetsu”临时边界的新方法","authors":"T. Araki, S. Ikehara, J. Tuchihase","doi":"10.1109/ROMAN.1993.367738","DOIUrl":null,"url":null,"abstract":"As Japanese sentences are usually written using thousand kinds of characters especially \"kanji\" characters, it is not easy to input them into computer files. There has been much research on the method which translates the non-segmented \"kana\" sentences into the \"kanji-kana\" sentences. However, the amount of computer memory required for the translating processing explodes in many times, because the number of the combinations of candidates for \"kanji-kana\" words grows rapidly in proportion to the increasing of the length of the sentence. The memory explosion can be prevented if a sentence is separated into \"bunsetsu\" This paper proposes a new method of finding provisional boundaries of \"bunsetsu\" of non-segmented \"kana\" sentences using 2nd-order Markov chain probabilities. \"Relevance factor\" P and \"Recall factor\" R for provisional boundaries of \"bunsetsu\" determined by this method, were evaluated by experiment using the statistical data for 70 issues of a daily Japanese newspaper.<<ETX>>","PeriodicalId":270591,"journal":{"name":"Proceedings of 1993 2nd IEEE International Workshop on Robot and Human Communication","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A new method of finding provisional boundaries of \\\"bunsetsu\\\" using 2nd-order Markov model\",\"authors\":\"T. Araki, S. Ikehara, J. Tuchihase\",\"doi\":\"10.1109/ROMAN.1993.367738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As Japanese sentences are usually written using thousand kinds of characters especially \\\"kanji\\\" characters, it is not easy to input them into computer files. There has been much research on the method which translates the non-segmented \\\"kana\\\" sentences into the \\\"kanji-kana\\\" sentences. However, the amount of computer memory required for the translating processing explodes in many times, because the number of the combinations of candidates for \\\"kanji-kana\\\" words grows rapidly in proportion to the increasing of the length of the sentence. The memory explosion can be prevented if a sentence is separated into \\\"bunsetsu\\\" This paper proposes a new method of finding provisional boundaries of \\\"bunsetsu\\\" of non-segmented \\\"kana\\\" sentences using 2nd-order Markov chain probabilities. \\\"Relevance factor\\\" P and \\\"Recall factor\\\" R for provisional boundaries of \\\"bunsetsu\\\" determined by this method, were evaluated by experiment using the statistical data for 70 issues of a daily Japanese newspaper.<<ETX>>\",\"PeriodicalId\":270591,\"journal\":{\"name\":\"Proceedings of 1993 2nd IEEE International Workshop on Robot and Human Communication\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1993 2nd IEEE International Workshop on Robot and Human Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROMAN.1993.367738\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1993 2nd IEEE International Workshop on Robot and Human Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROMAN.1993.367738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new method of finding provisional boundaries of "bunsetsu" using 2nd-order Markov model
As Japanese sentences are usually written using thousand kinds of characters especially "kanji" characters, it is not easy to input them into computer files. There has been much research on the method which translates the non-segmented "kana" sentences into the "kanji-kana" sentences. However, the amount of computer memory required for the translating processing explodes in many times, because the number of the combinations of candidates for "kanji-kana" words grows rapidly in proportion to the increasing of the length of the sentence. The memory explosion can be prevented if a sentence is separated into "bunsetsu" This paper proposes a new method of finding provisional boundaries of "bunsetsu" of non-segmented "kana" sentences using 2nd-order Markov chain probabilities. "Relevance factor" P and "Recall factor" R for provisional boundaries of "bunsetsu" determined by this method, were evaluated by experiment using the statistical data for 70 issues of a daily Japanese newspaper.<>