{"title":"Clickbait Detection of Indonesian News Headlines using Fine-Tune Bidirectional Encoder Representations from Transformers (BERT)","authors":"Diyah Utami Kusumaning Putri, Dinar Nugroho Pratomo","doi":"10.25139/inform.v7i2.4686","DOIUrl":null,"url":null,"abstract":"The problem of the existence of news article that does not match with content, called clickbait, has seriously interfered readers from getting the information they expect. The number of clickbait news continues significantly increased in recent years. According to this problem, a clickbait detector is required to automatically identify news article headlines that include clickbait and non-clickbait. Additionally, many currently existing solutions use handcrafted features and traditional machine learning methods, which limit the generalization. Therefore, this study fine-tunes the Bidirectional Encoder Representations from Transformers (BERT) and uses the Indonesian news headlines dataset CLICK-ID to predict clickbait (BERT). In this research, we use IndoBERT as the pre-trained model, a state-of-the-art BERT-based language model for Indonesian. Then, the usefulness of BERT-based classifiers is then assessed by comparing the performance of IndoBERT classifiers with different pre-trained models with that of two word-vectors-based approaches (i.e., bag-of-words and TF-IDF) and five machine learning classifiers (i.e., NB, KNN, SVM, DT, and RF). The evaluation results indicate that all fine-tuned IndoBERT classifiers outperform all word-vectors-based machine learning classifiers in classifying clickbait and non-clickbait Indonesian news headlines. The IndoBERTBASE using the two training phases model gets the highest accuracy of 0.8247, which is 0.064 (6%), outperforming the SVM classifier's accuracy with the bag-of-words model 0.7607.","PeriodicalId":52760,"journal":{"name":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inform Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25139/inform.v7i2.4686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The problem of the existence of news article that does not match with content, called clickbait, has seriously interfered readers from getting the information they expect. The number of clickbait news continues significantly increased in recent years. According to this problem, a clickbait detector is required to automatically identify news article headlines that include clickbait and non-clickbait. Additionally, many currently existing solutions use handcrafted features and traditional machine learning methods, which limit the generalization. Therefore, this study fine-tunes the Bidirectional Encoder Representations from Transformers (BERT) and uses the Indonesian news headlines dataset CLICK-ID to predict clickbait (BERT). In this research, we use IndoBERT as the pre-trained model, a state-of-the-art BERT-based language model for Indonesian. Then, the usefulness of BERT-based classifiers is then assessed by comparing the performance of IndoBERT classifiers with different pre-trained models with that of two word-vectors-based approaches (i.e., bag-of-words and TF-IDF) and five machine learning classifiers (i.e., NB, KNN, SVM, DT, and RF). The evaluation results indicate that all fine-tuned IndoBERT classifiers outperform all word-vectors-based machine learning classifiers in classifying clickbait and non-clickbait Indonesian news headlines. The IndoBERTBASE using the two training phases model gets the highest accuracy of 0.8247, which is 0.064 (6%), outperforming the SVM classifier's accuracy with the bag-of-words model 0.7607.