{"title":"关于通过固定长度的子词确定单词的前缀和后缀的可能性","authors":"G. Zhukova, Y. Smetanin, M. Ulyanov","doi":"10.17323/2587-814x.2020.2.84.92","DOIUrl":null,"url":null,"abstract":"In applied problems of business informatics related to data analysis (in particular, in the analysis and forecasting of time series, in the study of log files of business processes, etc.), problems of qualitative analysis arise. Qualitative analysis methods often use symbolic coding as a way of presenting information about the processes under study. In a number of situations, due to the fragmentation of such descriptions, the problem arises of reconstructing a complete symbolic description of a process (word) from its successive fragments (subwords). From the multiset of all subwords of a sufficiently large length, the original word is uniquely restored. In the case of insufficiently long subwords, several different reconstructions of the original word are possible. The number of feasible reconstructions can be reduced by determining the suffix and prefix of the reconstructed word. A method is proposed for determining the prefix and suffix of a word consisting of k – 1 symbols each on the basis of multiset of subwords of a fixed length equal to . We accept the hypothesis that this multiset is generated by a window of a fixed length of one symbol shift in an unknown word. The method for determining the","PeriodicalId":41920,"journal":{"name":"Biznes Informatika-Business Informatics","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"About the possibility of determining the prefix and suffix of a word by subwords of fixed length\",\"authors\":\"G. Zhukova, Y. Smetanin, M. Ulyanov\",\"doi\":\"10.17323/2587-814x.2020.2.84.92\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In applied problems of business informatics related to data analysis (in particular, in the analysis and forecasting of time series, in the study of log files of business processes, etc.), problems of qualitative analysis arise. Qualitative analysis methods often use symbolic coding as a way of presenting information about the processes under study. In a number of situations, due to the fragmentation of such descriptions, the problem arises of reconstructing a complete symbolic description of a process (word) from its successive fragments (subwords). From the multiset of all subwords of a sufficiently large length, the original word is uniquely restored. In the case of insufficiently long subwords, several different reconstructions of the original word are possible. The number of feasible reconstructions can be reduced by determining the suffix and prefix of the reconstructed word. A method is proposed for determining the prefix and suffix of a word consisting of k – 1 symbols each on the basis of multiset of subwords of a fixed length equal to . We accept the hypothesis that this multiset is generated by a window of a fixed length of one symbol shift in an unknown word. The method for determining the\",\"PeriodicalId\":41920,\"journal\":{\"name\":\"Biznes Informatika-Business Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2020-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biznes Informatika-Business Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17323/2587-814x.2020.2.84.92\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biznes Informatika-Business Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/2587-814x.2020.2.84.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}
About the possibility of determining the prefix and suffix of a word by subwords of fixed length
In applied problems of business informatics related to data analysis (in particular, in the analysis and forecasting of time series, in the study of log files of business processes, etc.), problems of qualitative analysis arise. Qualitative analysis methods often use symbolic coding as a way of presenting information about the processes under study. In a number of situations, due to the fragmentation of such descriptions, the problem arises of reconstructing a complete symbolic description of a process (word) from its successive fragments (subwords). From the multiset of all subwords of a sufficiently large length, the original word is uniquely restored. In the case of insufficiently long subwords, several different reconstructions of the original word are possible. The number of feasible reconstructions can be reduced by determining the suffix and prefix of the reconstructed word. A method is proposed for determining the prefix and suffix of a word consisting of k – 1 symbols each on the basis of multiset of subwords of a fixed length equal to . We accept the hypothesis that this multiset is generated by a window of a fixed length of one symbol shift in an unknown word. The method for determining the