Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678155
C. Gurrin
In this position paper, the motivation for, and early progress in lifelog analytics and retrieval are presented. Early progress in the field is reviewed with a specific focus on the challenge of developing personal Memex-style lifelog search engines.
{"title":"Personal Data Matters: New Opportunities from Lifelogs","authors":"C. Gurrin","doi":"10.1109/iSAI-NLP54397.2021.9678155","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678155","url":null,"abstract":"In this position paper, the motivation for, and early progress in lifelog analytics and retrieval are presented. Early progress in the field is reviewed with a specific focus on the challenge of developing personal Memex-style lifelog search engines.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127130991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678160
S. Kongyoung, Kanokorn Trakultaweekoon, A. Rugchatjaroen
Thai Language can be handled/considered in the same group of Chinese and Japanese where no explicit spaces exist between words. This article presents a work on the emotional identification of tweets based on the use of emojis which focuses on a Thai language context. The use of emojis in user tweets indicates the writer’s emotions. The first phase of this study was to collect Thai tweets, clean them, and then to make a primary classification of the emojis into groups using K-mean clustering. These group clusters are used as target outputs for the prediction of emoji classes. It was found that 22 is the appropriate K for considering 70 emojis for a collected set of tweets. The corpus includes any level of Thai language usage, which means that the processed data can consist of suffixes, slang, and unknown word from tokenization process. The vector representation advances the unknown accent. In sum, this research created a corpus of short messages collected from Twitter which were grouped into 22 emoji-classes. The corpus includes 7,825,857 messages prepared for classification based on emotions by applying 2 biLSTM layers. A table of emojis is proposed based on Ekman’s six basic emotions: anger, disgust, fear, joy, sadness, and surprise were evaluated in both objective and subjective tests. The results show that word vectors work well for the classification of emotions through the use of emojis.
{"title":"Thai Language Tweet Emotion Prediction based on Use of Emojis","authors":"S. Kongyoung, Kanokorn Trakultaweekoon, A. Rugchatjaroen","doi":"10.1109/iSAI-NLP54397.2021.9678160","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678160","url":null,"abstract":"Thai Language can be handled/considered in the same group of Chinese and Japanese where no explicit spaces exist between words. This article presents a work on the emotional identification of tweets based on the use of emojis which focuses on a Thai language context. The use of emojis in user tweets indicates the writer’s emotions. The first phase of this study was to collect Thai tweets, clean them, and then to make a primary classification of the emojis into groups using K-mean clustering. These group clusters are used as target outputs for the prediction of emoji classes. It was found that 22 is the appropriate K for considering 70 emojis for a collected set of tweets. The corpus includes any level of Thai language usage, which means that the processed data can consist of suffixes, slang, and unknown word from tokenization process. The vector representation advances the unknown accent. In sum, this research created a corpus of short messages collected from Twitter which were grouped into 22 emoji-classes. The corpus includes 7,825,857 messages prepared for classification based on emotions by applying 2 biLSTM layers. A table of emojis is proposed based on Ekman’s six basic emotions: anger, disgust, fear, joy, sadness, and surprise were evaluated in both objective and subjective tests. The results show that word vectors work well for the classification of emotions through the use of emojis.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123363258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678158
Mawin Khumdee, Pongpol Assawaroongsakul, P. Phasukkit, Nongluck Houngkamhang
This paper proposes breast cancer positioning detection using the IR-UWB system with deep learning, which is an interesting alternative method. When compared to ultrasound, x-ray mammogram, and CT-scan, there are several advantages to using IR-UWB, including low cost, less energy required, less long-term effect, portability, and providing much more breast cancer screening access for patients. Nowadays, the IR-UWB system has many techniques for processing IR-UWB signals, and one of the most interesting technique is using deep learning. In this study, we collected data from nine IR-UWB antennas. Then, the prepared data is fed through Deep Neural Networks to find the hidden patterns of signal and predict the cancer position which are 16 of breast cancer positions and one of undetected, also known as 17 classes. The model gave an average accuracy up to 95.60%.
{"title":"Breast Cancer Detection using IR-UWB with Deep Learning","authors":"Mawin Khumdee, Pongpol Assawaroongsakul, P. Phasukkit, Nongluck Houngkamhang","doi":"10.1109/iSAI-NLP54397.2021.9678158","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678158","url":null,"abstract":"This paper proposes breast cancer positioning detection using the IR-UWB system with deep learning, which is an interesting alternative method. When compared to ultrasound, x-ray mammogram, and CT-scan, there are several advantages to using IR-UWB, including low cost, less energy required, less long-term effect, portability, and providing much more breast cancer screening access for patients. Nowadays, the IR-UWB system has many techniques for processing IR-UWB signals, and one of the most interesting technique is using deep learning. In this study, we collected data from nine IR-UWB antennas. Then, the prepared data is fed through Deep Neural Networks to find the hidden patterns of signal and predict the cancer position which are 16 of breast cancer positions and one of undetected, also known as 17 classes. The model gave an average accuracy up to 95.60%.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"22 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125686559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678152
Thanaphut Khuntiyaporn, Pokpong Songmuang, W. Limprasert
Jobshop Scheduling Problem is a classic complex problem in every field, such as education, business, and daily life. This problem has been changed due to the changing of problem space. For this reason, JSP problems are categorized into many different types, which consist of The General Jobshop Scheduling (GJSP), The Flexible Jobshop Scheduling (FJSP) and The Multiple-routes Jobshop Scheduling (MrJSP). However, most of the research that tries to solve the JSP problem has focused on the shortest makespan scheduling. Still, sometimes the minimum makespan can be led to very high operating costs, which have a significant impact on operating results. Therefore, the Multiple-objectives Flexible Jobshop Scheduling Problem (M-FJSP) become the focused problem in this research. The proposed method is a Reinforcement Learning Model (RL) with a Q-Learning algorithm. The experimental dataset uses data from the OR-Library, which is the collection for a variety of Operation Research (OR) problems. Our proposed models will be compared between the three different states definition in which we expect the meta-heuristic model will be the best performance model.
{"title":"The Multiple Objectives Flexible Jobshop Scheduling Using Reinforcement Learning","authors":"Thanaphut Khuntiyaporn, Pokpong Songmuang, W. Limprasert","doi":"10.1109/iSAI-NLP54397.2021.9678152","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678152","url":null,"abstract":"Jobshop Scheduling Problem is a classic complex problem in every field, such as education, business, and daily life. This problem has been changed due to the changing of problem space. For this reason, JSP problems are categorized into many different types, which consist of The General Jobshop Scheduling (GJSP), The Flexible Jobshop Scheduling (FJSP) and The Multiple-routes Jobshop Scheduling (MrJSP). However, most of the research that tries to solve the JSP problem has focused on the shortest makespan scheduling. Still, sometimes the minimum makespan can be led to very high operating costs, which have a significant impact on operating results. Therefore, the Multiple-objectives Flexible Jobshop Scheduling Problem (M-FJSP) become the focused problem in this research. The proposed method is a Reinforcement Learning Model (RL) with a Q-Learning algorithm. The experimental dataset uses data from the OR-Library, which is the collection for a variety of Operation Research (OR) problems. Our proposed models will be compared between the three different states definition in which we expect the meta-heuristic model will be the best performance model.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129799686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678151
L. Kovavisaruch, T. Sanpechuda
Recommendation systems for the museum have been active in the past decade. It used to be a difficult task to make the personalized recommended list for museum-goer. However, with the current technology, research can provide the list for visitors via technology such as mobile applications. We have proposed a recommendation system based on social filtering and statistical methods in the previous paper. This paper applies the F1-score to evaluate our recommendation methods on the actual visitor loggers from Chao sampradaya national museum. We compare the social filtering method with the statistical method and benchmark with the random recommendation. In comparison, the statistical method gives the same result as social filtering when the time is limited. The longer time the visitor spends in the museum, the better result from the social filtering. However, in terms of calculation complexity, the statistical method outperforms social filtering.
{"title":"The comparison of the proposed recommended system with actual data","authors":"L. Kovavisaruch, T. Sanpechuda","doi":"10.1109/iSAI-NLP54397.2021.9678151","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678151","url":null,"abstract":"Recommendation systems for the museum have been active in the past decade. It used to be a difficult task to make the personalized recommended list for museum-goer. However, with the current technology, research can provide the list for visitors via technology such as mobile applications. We have proposed a recommendation system based on social filtering and statistical methods in the previous paper. This paper applies the F1-score to evaluate our recommendation methods on the actual visitor loggers from Chao sampradaya national museum. We compare the social filtering method with the statistical method and benchmark with the random recommendation. In comparison, the statistical method gives the same result as social filtering when the time is limited. The longer time the visitor spends in the museum, the better result from the social filtering. However, in terms of calculation complexity, the statistical method outperforms social filtering.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678163
Tith Vong, C. Jeenanunta, Apinun Tunpan, Nisit Sirimarnkit
Production planners could not get the update on the actual number of products in real-time. They do not realize the unmatched production until a few days later. Thus, the planners need to revise their production plan with reserve capacity for this unmatched production, and it causes manufacturing to waste a lot of time and money. The production outcome is usually manually counted at the end of the day and recorded on paper. This paper proposes an image processing system for counting products with a timestamp. The YOLOv4-tiny and DNN-OpenCV are utilized to detect an object. The detected object will be counted using the intersection detection and tesseract engine to extract time from the video. The object detection is trained using the 10 folds technique with 106 object photos. The proposed approach is tested with 8 videos for counting accuracy and timestamp accuracy. The testing result reveals that our proposed method achieves 100% of object counting and timestamp accuracy of 80 % compared with the manual counting with the timestamp. The proposed technique is suitable for counting objects with timestamps in real-time.
{"title":"The Low Computation and Real-Time Shoe Detection with Timestamp for Production Tracking in Shoe Manufacturing","authors":"Tith Vong, C. Jeenanunta, Apinun Tunpan, Nisit Sirimarnkit","doi":"10.1109/iSAI-NLP54397.2021.9678163","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678163","url":null,"abstract":"Production planners could not get the update on the actual number of products in real-time. They do not realize the unmatched production until a few days later. Thus, the planners need to revise their production plan with reserve capacity for this unmatched production, and it causes manufacturing to waste a lot of time and money. The production outcome is usually manually counted at the end of the day and recorded on paper. This paper proposes an image processing system for counting products with a timestamp. The YOLOv4-tiny and DNN-OpenCV are utilized to detect an object. The detected object will be counted using the intersection detection and tesseract engine to extract time from the video. The object detection is trained using the 10 folds technique with 106 object photos. The proposed approach is tested with 8 videos for counting accuracy and timestamp accuracy. The testing result reveals that our proposed method achieves 100% of object counting and timestamp accuracy of 80 % compared with the manual counting with the timestamp. The proposed technique is suitable for counting objects with timestamps in real-time.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126083483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678179
Language model pre-training techniques have been successfully applied to several natural language processing and text-mining tasks. However, existing published studies regarding automatic IT service desk ticket categorization were mostly conducted using the traditional bag-of-words (BoW) model and focused on the tickets that contain only one language. Therefore, this paper presents an examination of applying the state-of-the-art language model pre-training approaches to automatically determine the service category of bilingual IT service desk tickets, particularly for those tickets that contain Thai and/or English texts. Three well-known algorithms, mBERT, ULMFiT, and XLM-R, are investigated in this study using an in-house real-world dataset. Three Ensemble methods with bag-of-words text representation are used as performance evaluation baselines. According to our experimental results, language model pre-training techniques are superior to the BoW-based Ensemble methods for bilingual IT ticket categorization tasks. XLM-R gives the highest overall performance at 87.02% accuracy and 86.96% F1-score on the test dataset, followed by ULMFiT, mBERT and Ensemble methods, respectively
{"title":"Bilingual IT Service Desk Ticket Classification Using Language Model Pre-training Techniques","authors":"","doi":"10.1109/iSAI-NLP54397.2021.9678179","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678179","url":null,"abstract":"Language model pre-training techniques have been successfully applied to several natural language processing and text-mining tasks. However, existing published studies regarding automatic IT service desk ticket categorization were mostly conducted using the traditional bag-of-words (BoW) model and focused on the tickets that contain only one language. Therefore, this paper presents an examination of applying the state-of-the-art language model pre-training approaches to automatically determine the service category of bilingual IT service desk tickets, particularly for those tickets that contain Thai and/or English texts. Three well-known algorithms, mBERT, ULMFiT, and XLM-R, are investigated in this study using an in-house real-world dataset. Three Ensemble methods with bag-of-words text representation are used as performance evaluation baselines. According to our experimental results, language model pre-training techniques are superior to the BoW-based Ensemble methods for bilingual IT ticket categorization tasks. XLM-R gives the highest overall performance at 87.02% accuracy and 86.96% F1-score on the test dataset, followed by ULMFiT, mBERT and Ensemble methods, respectively","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129352783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-21DOI: 10.1109/iSAI-NLP54397.2021.9678189
S. Watcharabutsarakham, S. Marukatat, Supphachoke Suntiwichaya, Chanchai Junlouchai
In today’s world, people go outside wearing a face mask, so face detection and face recognition models need to take this into account. Facial recognition has been researched widely with various algorithms. Since the coronavirus disease of 2019 (COVID-19) outbreak has spread across Thailand, our use of face recognition models has reminded people to wear a face mask. This is because when people go outside, they are likely to be exposed to facial image detection and classification methods which are used for authentication and authorization. In this paper, we use transfer learning such as YOLOv3 and training with public datasets and donation datasets. Our models can recognize faces with a 98.7% accuracy rate and identify faces including those with face masks-with a 92.7% accuracy rate.
{"title":"Partial Facial Identification using Transfer Learning Technique","authors":"S. Watcharabutsarakham, S. Marukatat, Supphachoke Suntiwichaya, Chanchai Junlouchai","doi":"10.1109/iSAI-NLP54397.2021.9678189","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678189","url":null,"abstract":"In today’s world, people go outside wearing a face mask, so face detection and face recognition models need to take this into account. Facial recognition has been researched widely with various algorithms. Since the coronavirus disease of 2019 (COVID-19) outbreak has spread across Thailand, our use of face recognition models has reminded people to wear a face mask. This is because when people go outside, they are likely to be exposed to facial image detection and classification methods which are used for authentication and authorization. In this paper, we use transfer learning such as YOLOv3 and training with public datasets and donation datasets. Our models can recognize faces with a 98.7% accuracy rate and identify faces including those with face masks-with a 92.7% accuracy rate.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128465523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kale is a popular ingredient in Thai cuisine and can be grown year-round. However, kale requires particular care, especially pests. Therefore, this study applies the Internet of Things to propose the KaleCare, a smart farm management system for kale with four main functions including automatic watering based on weather forecasting, automatic fertilizing, reporting, and pest detection for cutworms, and aphids. There are three processes to create the pest classification models for pest detection function. Firstly, the raw images were applied to the GrabCut to remove the background. Secondly, data augmentation was applied to generate images due to the small amount of raw data. Finally, the modified GoogLeNet reduced the original GoogLeNet structure is proposed to classify both types of pests. The experimental results show that the proposed model outperforms with 0.8903 and 0.7959 in average classification rate and 0.886 and 0.7965 in average F1-score to classify cutworm and aphid, respectively.
{"title":"KaleCare: Smart Farm for Kale with Pests Detection System using Machine Learning","authors":"Natthaphon Tachai, Perapat Yato, Teerachai Muangpan, Krittakom Srijiranon, Narissara Eiamkanitchat","doi":"10.1109/iSAI-NLP54397.2021.9678178","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678178","url":null,"abstract":"Kale is a popular ingredient in Thai cuisine and can be grown year-round. However, kale requires particular care, especially pests. Therefore, this study applies the Internet of Things to propose the KaleCare, a smart farm management system for kale with four main functions including automatic watering based on weather forecasting, automatic fertilizing, reporting, and pest detection for cutworms, and aphids. There are three processes to create the pest classification models for pest detection function. Firstly, the raw images were applied to the GrabCut to remove the background. Secondly, data augmentation was applied to generate images due to the small amount of raw data. Finally, the modified GoogLeNet reduced the original GoogLeNet structure is proposed to classify both types of pests. The experimental results show that the proposed model outperforms with 0.8903 and 0.7959 in average classification rate and 0.886 and 0.7965 in average F1-score to classify cutworm and aphid, respectively.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.
{"title":"LimeSoda: Dataset for Fake News Detection in Healthcare Domain","authors":"Patomporn Payoungkhamdee, Peerachet Porkaew, Atthasith Sinthunyathum, Phattharaphon Songphum, Witsarut Kawidam, Wichayut Loha-Udom, P. Boonkwan, Vipas Sutantayawalee","doi":"10.1109/iSAI-NLP54397.2021.9678187","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678187","url":null,"abstract":"In this paper, we present our Thai fake news dataset in the healthcare domain, LIMESODA, with the construction guideline. Each document in the dataset is classified as fact, fake, or undefined. Moreover, we also provide token-level annotations for validating classifier decisions. Five high-level annotation tags1 are 1) misleading headline 2) imposter 3) fabrication 4) false connection and 5) misleading content. We curate and manually annotated 7,191 documents with these tags. We evaluate our dataset with two deep learning approaches; RNN and Transformer baselines and analyzed token-level contributions to understand model behaviors. For the RNN model, we use the attention weights as token-level contributions. For Transformer models, we use the integrated gradient method at the embedding layers. We finally compared these token-level contributions with human annotations. Although our baseline models yield promising performances, we found that tokens that support model decisions are quite different from human annotation.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128986795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}