{"title":"基于LD和ANN-SoftMaxRegressor的中药文本分类研究","authors":"Fa Zhang, Hui Zhang, Xiaoling Jiang","doi":"10.1109/ISKE47853.2019.9170353","DOIUrl":null,"url":null,"abstract":"In the long-term accumulation of traditional Chinese medicine, there are many synonymous and different words because of different expressions and descriptions, or the same disease is divided into different diseases because of different clinical manifestations. This phenomenon makes the dimension of the disease data set larger and sparse. In this study, two data sets (one is the uci-breast Cancer data set, the other is the Chinese medicine data crawled by yaozhi network) were used for algorithm experiment. It has been proved by experiments that after the establishment of the disease dictionary through the string editing distance (Levenshtein Distance), the bi-gram smoothing process is performed, and then the PCA dimension reduction is lower. After the ANN-SoftMax Regressor training, the Chinese medicine data fuzzy symptom classification is more accurate, achieving an accuracy of 90.95%.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on Text Classification of Traditional Chinese Medicine Based on LD and ANN-SoftMaxRegressor\",\"authors\":\"Fa Zhang, Hui Zhang, Xiaoling Jiang\",\"doi\":\"10.1109/ISKE47853.2019.9170353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the long-term accumulation of traditional Chinese medicine, there are many synonymous and different words because of different expressions and descriptions, or the same disease is divided into different diseases because of different clinical manifestations. This phenomenon makes the dimension of the disease data set larger and sparse. In this study, two data sets (one is the uci-breast Cancer data set, the other is the Chinese medicine data crawled by yaozhi network) were used for algorithm experiment. It has been proved by experiments that after the establishment of the disease dictionary through the string editing distance (Levenshtein Distance), the bi-gram smoothing process is performed, and then the PCA dimension reduction is lower. After the ANN-SoftMax Regressor training, the Chinese medicine data fuzzy symptom classification is more accurate, achieving an accuracy of 90.95%.\",\"PeriodicalId\":399084,\"journal\":{\"name\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE47853.2019.9170353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Study on Text Classification of Traditional Chinese Medicine Based on LD and ANN-SoftMaxRegressor
In the long-term accumulation of traditional Chinese medicine, there are many synonymous and different words because of different expressions and descriptions, or the same disease is divided into different diseases because of different clinical manifestations. This phenomenon makes the dimension of the disease data set larger and sparse. In this study, two data sets (one is the uci-breast Cancer data set, the other is the Chinese medicine data crawled by yaozhi network) were used for algorithm experiment. It has been proved by experiments that after the establishment of the disease dictionary through the string editing distance (Levenshtein Distance), the bi-gram smoothing process is performed, and then the PCA dimension reduction is lower. After the ANN-SoftMax Regressor training, the Chinese medicine data fuzzy symptom classification is more accurate, achieving an accuracy of 90.95%.