{"title":"波兰语文本分割","authors":"Pawel P. Mazur","doi":"10.1109/ISDA.2005.89","DOIUrl":null,"url":null,"abstract":"In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.","PeriodicalId":345842,"journal":{"name":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Text segmentation in Polish\",\"authors\":\"Pawel P. Mazur\",\"doi\":\"10.1109/ISDA.2005.89\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.\",\"PeriodicalId\":345842,\"journal\":{\"name\":\"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDA.2005.89\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2005.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.