Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.
{"title":"Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.","authors":"Zhouqing Zhang, Kongmeng Liew, Roeline Kuijer, Wan Jou She, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki","doi":"10.2196/51332","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there.</p><p><strong>Objective: </strong>With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance.</p><p><strong>Methods: </strong>We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic).</p><p><strong>Results: </strong>Our binary model (F<sub>1</sub>-score=0.92) and multiclass model (F<sub>1</sub>-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the \"no_patient\" class, but sadness and anxiety were higher for the \"family_members\" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the \"no_patient\" class, but topics on cancer treatment were higher in the \"family_members\" class.</p><p><strong>Conclusions: </strong>Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.</p>","PeriodicalId":45538,"journal":{"name":"JMIR Cancer","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11117131/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cancer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/51332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there.
Objective: With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance.
Methods: We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic).
Results: Our binary model (F1-score=0.92) and multiclass model (F1-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the "no_patient" class, but sadness and anxiety were higher for the "family_members" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the "no_patient" class, but topics on cancer treatment were higher in the "family_members" class.
Conclusions: Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.