{"title":"多语言抽象文本摘要的深度体系结构","authors":"Amr M. Zaki, M. Khalil, Hazem M. Abbas","doi":"10.1109/ICCES48960.2019.9068171","DOIUrl":null,"url":null,"abstract":"Abstractive text summarization is the task of generating a novel summary given an article, not by merely extracting and selecting text to produce a summary, but by actually creating and understating the given text to produce a summary. LSTM seq2seq encoder-decoder with attention models have proved successful in this task, but they suffer from some problems. In this work, we would go through multiple models to try and solve these problems, beginning with simple seq2seq with attention models to going to Pointer-Generator, to using a curriculum learning approach called Scheduled-Sampling, till we reach the new approaches of combining reinforcement learning with seq2seq. We have applied these models on multiple datasets for multiple languages, English and Arabic. We have also introduced a new novel method of working with agglutinative languages, it is a preprocessing technique that is applied to the dataset which increases the relevancy of the vocabulary, which effectively increases the efficiency of the text summarization without modifying the models, we call this technique advanced cleaning, we have applied it to the Arabic dataset, and it can then be applied to any other agglutinative language. We have built these models in Jupiter notebooks to run seamlessly on Google colaboratory.11https://medium.com/@theamrzaki22https://github.com/theamrzaki/text_summurization_abstractive_methods","PeriodicalId":136643,"journal":{"name":"2019 14th International Conference on Computer Engineering and Systems (ICCES)","volume":"223 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Deep Architectures for Abstractive Text Summarization in Multiple Languages\",\"authors\":\"Amr M. Zaki, M. Khalil, Hazem M. Abbas\",\"doi\":\"10.1109/ICCES48960.2019.9068171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstractive text summarization is the task of generating a novel summary given an article, not by merely extracting and selecting text to produce a summary, but by actually creating and understating the given text to produce a summary. LSTM seq2seq encoder-decoder with attention models have proved successful in this task, but they suffer from some problems. In this work, we would go through multiple models to try and solve these problems, beginning with simple seq2seq with attention models to going to Pointer-Generator, to using a curriculum learning approach called Scheduled-Sampling, till we reach the new approaches of combining reinforcement learning with seq2seq. We have applied these models on multiple datasets for multiple languages, English and Arabic. We have also introduced a new novel method of working with agglutinative languages, it is a preprocessing technique that is applied to the dataset which increases the relevancy of the vocabulary, which effectively increases the efficiency of the text summarization without modifying the models, we call this technique advanced cleaning, we have applied it to the Arabic dataset, and it can then be applied to any other agglutinative language. We have built these models in Jupiter notebooks to run seamlessly on Google colaboratory.11https://medium.com/@theamrzaki22https://github.com/theamrzaki/text_summurization_abstractive_methods\",\"PeriodicalId\":136643,\"journal\":{\"name\":\"2019 14th International Conference on Computer Engineering and Systems (ICCES)\",\"volume\":\"223 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 14th International Conference on Computer Engineering and Systems (ICCES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCES48960.2019.9068171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 14th International Conference on Computer Engineering and Systems (ICCES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCES48960.2019.9068171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Architectures for Abstractive Text Summarization in Multiple Languages
Abstractive text summarization is the task of generating a novel summary given an article, not by merely extracting and selecting text to produce a summary, but by actually creating and understating the given text to produce a summary. LSTM seq2seq encoder-decoder with attention models have proved successful in this task, but they suffer from some problems. In this work, we would go through multiple models to try and solve these problems, beginning with simple seq2seq with attention models to going to Pointer-Generator, to using a curriculum learning approach called Scheduled-Sampling, till we reach the new approaches of combining reinforcement learning with seq2seq. We have applied these models on multiple datasets for multiple languages, English and Arabic. We have also introduced a new novel method of working with agglutinative languages, it is a preprocessing technique that is applied to the dataset which increases the relevancy of the vocabulary, which effectively increases the efficiency of the text summarization without modifying the models, we call this technique advanced cleaning, we have applied it to the Arabic dataset, and it can then be applied to any other agglutinative language. We have built these models in Jupiter notebooks to run seamlessly on Google colaboratory.11https://medium.com/@theamrzaki22https://github.com/theamrzaki/text_summurization_abstractive_methods