{"title":"The Summarization of Creative Content","authors":"Olivier Toubia","doi":"10.2139/ssrn.3020131","DOIUrl":null,"url":null,"abstract":"We study and model the process by which humans summarize creative documents (e.g., from a movie script to a synopsis). We develop a customized topic model based on Poisson Factorization and inspired by the creativity literature, which links the text in a summary to the text in the original document. Traditional Poisson Factorization approximates documents as positive combinations of topics, i.e., as points in the cone defined by a set of topics (in the Euclidean space defined by the words in the vocabulary). The model proposed here captures not only this “inside the cone” portion of a document, but also the “outside the cone” portion that is not explained by a combination of common topics. The model captures how these two types of content are weighed in summaries as compared to full documents. In addition, it captures writing norms that influence the extent to which each topic appears in summaries compared to full documents. We apply this model to a dataset of marketing academic papers and their abstracts, and to a dataset of movie scripts and their synopses. We illustrate a practical application of our research by creating a public, online interactive tool meant to serve as a “sounding board” for users interested in writing summaries of creative documents.","PeriodicalId":250074,"journal":{"name":"IRPN: Anthropology & Cultural Studies Innovation (Topic)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRPN: Anthropology & Cultural Studies Innovation (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3020131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We study and model the process by which humans summarize creative documents (e.g., from a movie script to a synopsis). We develop a customized topic model based on Poisson Factorization and inspired by the creativity literature, which links the text in a summary to the text in the original document. Traditional Poisson Factorization approximates documents as positive combinations of topics, i.e., as points in the cone defined by a set of topics (in the Euclidean space defined by the words in the vocabulary). The model proposed here captures not only this “inside the cone” portion of a document, but also the “outside the cone” portion that is not explained by a combination of common topics. The model captures how these two types of content are weighed in summaries as compared to full documents. In addition, it captures writing norms that influence the extent to which each topic appears in summaries compared to full documents. We apply this model to a dataset of marketing academic papers and their abstracts, and to a dataset of movie scripts and their synopses. We illustrate a practical application of our research by creating a public, online interactive tool meant to serve as a “sounding board” for users interested in writing summaries of creative documents.