{"title":"利用 MetaMap 在 Twitter 上实现语义主题建模。","authors":"Rebecca Shyu, Chunhua Weng","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \"hydroxychloroquine\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"670-678"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enabling Semantic Topic Modeling on Twitter Using MetaMap.\",\"authors\":\"Rebecca Shyu, Chunhua Weng\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning \\\"hydroxychloroquine\\\" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.</p>\",\"PeriodicalId\":72181,\"journal\":{\"name\":\"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science\",\"volume\":\"2024 \",\"pages\":\"670-678\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141808/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
Enabling Semantic Topic Modeling on Twitter Using MetaMap.
Topic modeling performs poorly on short phrases or sentences and ever-changing slang, which are common in social media, such as X, formerly known as Twitter. This study investigates whether concept annotation tools such as MetaMap can enable topic modeling at the semantic level. Using tweets mentioning "hydroxychloroquine" for a case study, we extracted 56,017 posted between 03/01/2020-12/31/2021. The tweets were run through MetaMap to encode concepts with UMLS Concept Unique Identifiers (CUIs) and then we used Latent Dirichlet Allocation (LDA) to identify the optimal model for two datasets: 1) tweets with the original text and 2) tweets with the replaced CUIs. We found that the MetaMap LDA models outperformed the non-MetaMap models in terms of coherence and representativeness and identified topics timely relevant to social and political discussions. We concluded that integrating MetaMap to standardize tweets through UMLS concepts improved semantic topic modeling performance amidst noise in the text.