{"title":"基于文本分析的小说人物特征提取","authors":"Tingting Wu, Jianming Hu, Xin Zhu","doi":"10.1109/cost57098.2022.00089","DOIUrl":null,"url":null,"abstract":"Research on automatic character analysis of novels can help to achieve automatic Q & A with fictional characters. In this paper, a corpus containing 1435 novel texts was constructed with Chinese martial arts novel characters as the research object, and a total of 57026 characters were extracted. The character vectors were generated by Skip-gram model training, and the effect of applying the character vectors was explored. Similarity calculation and K-means clustering were performed on the persona vectors, and the experimental results showed that people from the same author usually have similarity. The gender classification prediction was performed using logistic regression and support vector machine for the persona vectors respectively, and the experimental results showed that both classification algorithms could predict the gender of the new sample characters well.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Character Feature Extraction for Novels Based on Text Analysis\",\"authors\":\"Tingting Wu, Jianming Hu, Xin Zhu\",\"doi\":\"10.1109/cost57098.2022.00089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research on automatic character analysis of novels can help to achieve automatic Q & A with fictional characters. In this paper, a corpus containing 1435 novel texts was constructed with Chinese martial arts novel characters as the research object, and a total of 57026 characters were extracted. The character vectors were generated by Skip-gram model training, and the effect of applying the character vectors was explored. Similarity calculation and K-means clustering were performed on the persona vectors, and the experimental results showed that people from the same author usually have similarity. The gender classification prediction was performed using logistic regression and support vector machine for the persona vectors respectively, and the experimental results showed that both classification algorithms could predict the gender of the new sample characters well.\",\"PeriodicalId\":135595,\"journal\":{\"name\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cost57098.2022.00089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cost57098.2022.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Character Feature Extraction for Novels Based on Text Analysis
Research on automatic character analysis of novels can help to achieve automatic Q & A with fictional characters. In this paper, a corpus containing 1435 novel texts was constructed with Chinese martial arts novel characters as the research object, and a total of 57026 characters were extracted. The character vectors were generated by Skip-gram model training, and the effect of applying the character vectors was explored. Similarity calculation and K-means clustering were performed on the persona vectors, and the experimental results showed that people from the same author usually have similarity. The gender classification prediction was performed using logistic regression and support vector machine for the persona vectors respectively, and the experimental results showed that both classification algorithms could predict the gender of the new sample characters well.