Data: This study looks at the content on Reddit's COVID-19 community, r/Coronavirus, to capture and understand the main themes and discussions around the global pandemic, and their evolution over the first year of the pandemic. It studies 356,690 submissions (posts) and 9,413,331 comments associated with the submissions, corresponding to the period of 20th January 2020 and 31st January 2021.
Methodology: On each of these datasets we carried out analysis based on lexical sentiment and topics generated from unsupervised topic modelling. The study found that negative sentiments show higher ratio in submissions while negative sentiments were of the same ratio as positive ones in the comments. Terms associated more positively or negatively were identified. Upon assessment of the upvotes and downvotes, this study also uncovered contentious topics, particularly "fake" or misleading news.
Results: Through topic modelling, 9 distinct topics were identified from submissions while 20 were identified from comments. Overall, this study provides a clear overview on the dominating topics and popular sentiments pertaining the pandemic during the first year.
Conclusion: Our methodology provides an invaluable tool for governments and health decision makers and authorities to obtain a deeper understanding of the dominant public concerns and attitudes, which is vital for understanding, designing and implementing interventions for a global pandemic.