Mariana O. Silva, Clarisse Scofield, Mirella M. Moro
{"title":"PPORTAL: Public Domain Portuguese-language Literature Dataset","authors":"Mariana O. Silva, Clarisse Scofield, Mirella M. Moro","doi":"10.5753/dsw.2021.17416","DOIUrl":null,"url":null,"abstract":"Combining human expertise with book-consumers data may generate what is needed to sustain constant changes experienced in the book publishing market. Then, building and making available datasets that entirely comprise the essential elements of the book industry ecosystem is essential. However, little has been done in such a context for non-English languages, such as Portuguese. Hence, we introduce PPORTAL, a public domain Portuguese-language literature dataset composed of books-related metadata. After an overview of its building process and content, we discuss a brief exploratory data analysis to summarize its main characteristics. We also highlight potential applications, showing how PPORTAL is useful as a resource on different research domains.","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do III Dataset Showcase Workshop (DSW 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/dsw.2021.17416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Combining human expertise with book-consumers data may generate what is needed to sustain constant changes experienced in the book publishing market. Then, building and making available datasets that entirely comprise the essential elements of the book industry ecosystem is essential. However, little has been done in such a context for non-English languages, such as Portuguese. Hence, we introduce PPORTAL, a public domain Portuguese-language literature dataset composed of books-related metadata. After an overview of its building process and content, we discuss a brief exploratory data analysis to summarize its main characteristics. We also highlight potential applications, showing how PPORTAL is useful as a resource on different research domains.