{"title":"Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal.","authors":"Wonjeong Jeong, Eunkyoung Song, Eunzi Jeong, Kyoung Hee Oh, Hye-Sun Lee, Jae Kwan Jun","doi":"10.4258/hir.2024.30.4.398","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>With the growing importance of monitoring cancer patients' internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.</p><p><strong>Methods: </strong>This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term \"am\" (Korean for \"cancer\") was used to identify keywords related to cancer.</p><p><strong>Results: </strong>In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with \"cure\" (2,218 occurrences), \"lung cancer\" (1,652), and \"breast cancer\" (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked \"struggle\" (1064.172) as the most significant keyword, followed by \"lung cancer\" (839.988) and \"breast cancer\" (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.</p><p><strong>Conclusions: </strong>The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"30 4","pages":"398-408"},"PeriodicalIF":2.3000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11570664/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2024.30.4.398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: With the growing importance of monitoring cancer patients' internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.
Methods: This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term "am" (Korean for "cancer") was used to identify keywords related to cancer.
Results: In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with "cure" (2,218 occurrences), "lung cancer" (1,652), and "breast cancer" (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked "struggle" (1064.172) as the most significant keyword, followed by "lung cancer" (839.988) and "breast cancer" (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.
Conclusions: The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.