Parun Ngamcharoen, Nuttapong Sanglerdsinlapachai, P. Vejjanugraha
{"title":"Automatic Thai Text Summarization Using Keyword-Based Abstractive Method","authors":"Parun Ngamcharoen, Nuttapong Sanglerdsinlapachai, P. Vejjanugraha","doi":"10.1109/iSAI-NLP56921.2022.9960265","DOIUrl":null,"url":null,"abstract":"Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences; the first set representing the source text, and the second set representing words existing in the reference summary, into the encoder and decoder parts of the model, respectively. However, by using this method, the model tends to perform poorly if the source text includes words which are irrelevant or insignificant to the key ideas. In order to address this issue, we propose a new keywords-based method for abstractive summarization by combining the information provided by the source text and its keywords to generate summary. We utilize a bi-directional long short-term memory model for keyword labelling, using overlapping words between the source text and the reference summary as ground truth. The results obtained from our experiments on ThaiSum dataset show that our proposed method outperforms the traditional encoder-decoder model by 0.0425 on ROUGE-1 F1, 0.0301 on ROUGE-2 F1 and 0.0140 on BERTScore Fl.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences; the first set representing the source text, and the second set representing words existing in the reference summary, into the encoder and decoder parts of the model, respectively. However, by using this method, the model tends to perform poorly if the source text includes words which are irrelevant or insignificant to the key ideas. In order to address this issue, we propose a new keywords-based method for abstractive summarization by combining the information provided by the source text and its keywords to generate summary. We utilize a bi-directional long short-term memory model for keyword labelling, using overlapping words between the source text and the reference summary as ground truth. The results obtained from our experiments on ThaiSum dataset show that our proposed method outperforms the traditional encoder-decoder model by 0.0425 on ROUGE-1 F1, 0.0301 on ROUGE-2 F1 and 0.0140 on BERTScore Fl.