A. Khan, Fida Kamal, Nuzhat Nower, Tasnim Ahmed, Tareque Mohmud Chowdhury
{"title":"基于变压器的个人健康提及检测模型的评价","authors":"A. Khan, Fida Kamal, Nuzhat Nower, Tasnim Ahmed, Tareque Mohmud Chowdhury","doi":"10.1109/ICCIT57492.2022.10054937","DOIUrl":null,"url":null,"abstract":"In public health surveillance, the identification of Personal Health Mentions (PHM) is an essential initial step. It involves examining a social media post that mentions an illness and determining whether the context of the post is about an actual person facing the illness or not. When attempting to determine how far a disease has spread, the monitoring of such public posts linked to healthcare is crucial, and numerous datasets have been produced to aid researchers in developing techniques to handle this. Unfortunately, social media posts tend to contain links, emojis, informal phrasing, sarcasm, etc., making them challenging to work with. To handle such issues and detect PHMs directly from social media posts, we propose a few transformer-based models and compare their performances. These models have not undergone a thorough evaluation in this domain, but are known to perform well on other language-related tasks. We trained the models on an imbalanced dataset produced by collecting a large number of public posts from Twitter. The empirical results show that we have achieved state-of-the-art performance on the dataset, with an average F1 score of 94.5% with the RoBERTa-based classifier. The code used in our experiments is publicly available1.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"387 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Evaluation of Transformer-Based Models in Personal Health Mention Detection\",\"authors\":\"A. Khan, Fida Kamal, Nuzhat Nower, Tasnim Ahmed, Tareque Mohmud Chowdhury\",\"doi\":\"10.1109/ICCIT57492.2022.10054937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In public health surveillance, the identification of Personal Health Mentions (PHM) is an essential initial step. It involves examining a social media post that mentions an illness and determining whether the context of the post is about an actual person facing the illness or not. When attempting to determine how far a disease has spread, the monitoring of such public posts linked to healthcare is crucial, and numerous datasets have been produced to aid researchers in developing techniques to handle this. Unfortunately, social media posts tend to contain links, emojis, informal phrasing, sarcasm, etc., making them challenging to work with. To handle such issues and detect PHMs directly from social media posts, we propose a few transformer-based models and compare their performances. These models have not undergone a thorough evaluation in this domain, but are known to perform well on other language-related tasks. We trained the models on an imbalanced dataset produced by collecting a large number of public posts from Twitter. The empirical results show that we have achieved state-of-the-art performance on the dataset, with an average F1 score of 94.5% with the RoBERTa-based classifier. The code used in our experiments is publicly available1.\",\"PeriodicalId\":255498,\"journal\":{\"name\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"volume\":\"387 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT57492.2022.10054937\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10054937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evaluation of Transformer-Based Models in Personal Health Mention Detection
In public health surveillance, the identification of Personal Health Mentions (PHM) is an essential initial step. It involves examining a social media post that mentions an illness and determining whether the context of the post is about an actual person facing the illness or not. When attempting to determine how far a disease has spread, the monitoring of such public posts linked to healthcare is crucial, and numerous datasets have been produced to aid researchers in developing techniques to handle this. Unfortunately, social media posts tend to contain links, emojis, informal phrasing, sarcasm, etc., making them challenging to work with. To handle such issues and detect PHMs directly from social media posts, we propose a few transformer-based models and compare their performances. These models have not undergone a thorough evaluation in this domain, but are known to perform well on other language-related tasks. We trained the models on an imbalanced dataset produced by collecting a large number of public posts from Twitter. The empirical results show that we have achieved state-of-the-art performance on the dataset, with an average F1 score of 94.5% with the RoBERTa-based classifier. The code used in our experiments is publicly available1.