{"title":"An internet reviews topic hierarchy mining method based on modified continuous renormalization procedure","authors":"Lin Qi, Feiyan Guo, Jian Zhang, Yuwei Wang","doi":"arxiv-2401.01118","DOIUrl":null,"url":null,"abstract":"Mining the hierarchical structure of Internet review topics and realizing a\nfine classification of review texts can help alleviate users' information\noverload. However, existing hierarchical topic classification methods primarily\nrely on external corpora and human intervention. This study proposes a Modified\nContinuous Renormalization (MCR) procedure that acts on the keyword\nco-occurrence network with fractal characteristics to achieve the topic\nhierarchy mining. First, the fractal characteristics in the keyword\nco-occurrence network of Internet review text are identified using a\nbox-covering algorithm for the first time. Then, the MCR algorithm established\non the edge adjacency entropy and the box distance is proposed to obtain the\ntopic hierarchy in the keyword co-occurrence network. Verification data from\nthe Dangdang.com book reviews shows that the MCR constructs topic hierarchies\nwith greater coherence and independence than the HLDA and the Louvain\nalgorithms. Finally, reliable review text classification is achieved using the\nMCR extended bottom level topic categories. The accuracy rate (P), recall rate\n(R) and F1 value of Internet review text classification obtained from the\nMCR-based topic hierarchy are significantly improved compared to four target\ntext classification algorithms.","PeriodicalId":501305,"journal":{"name":"arXiv - PHYS - Adaptation and Self-Organizing Systems","volume":"407 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Adaptation and Self-Organizing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.01118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Mining the hierarchical structure of Internet review topics and realizing a
fine classification of review texts can help alleviate users' information
overload. However, existing hierarchical topic classification methods primarily
rely on external corpora and human intervention. This study proposes a Modified
Continuous Renormalization (MCR) procedure that acts on the keyword
co-occurrence network with fractal characteristics to achieve the topic
hierarchy mining. First, the fractal characteristics in the keyword
co-occurrence network of Internet review text are identified using a
box-covering algorithm for the first time. Then, the MCR algorithm established
on the edge adjacency entropy and the box distance is proposed to obtain the
topic hierarchy in the keyword co-occurrence network. Verification data from
the Dangdang.com book reviews shows that the MCR constructs topic hierarchies
with greater coherence and independence than the HLDA and the Louvain
algorithms. Finally, reliable review text classification is achieved using the
MCR extended bottom level topic categories. The accuracy rate (P), recall rate
(R) and F1 value of Internet review text classification obtained from the
MCR-based topic hierarchy are significantly improved compared to four target
text classification algorithms.