Antônio Diogo Forte Martins, Lucas Cabral, Pedro Jorge Chaves Mourão, Ivandro Claudino de Sá, José Maria S. Monteiro, Javam C. Machado
Nowadays, our society suffers with a major issue that unfortunately is becoming more and more problematic, once again through social networks, that is the misinformation. The primary source of misinformation in Brazil is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few misinformation data sets built specifically from this platform. In this context, building a data set of WhatsApp messages about COVID-19 in Brazilian Portuguese and label misinformation messages within it becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled.
{"title":"COVID19.BR: A Dataset of Misinformation about COVID-19 in Brazilian Portuguese WhatsApp Messages","authors":"Antônio Diogo Forte Martins, Lucas Cabral, Pedro Jorge Chaves Mourão, Ivandro Claudino de Sá, José Maria S. Monteiro, Javam C. Machado","doi":"10.5753/dsw.2021.17422","DOIUrl":"https://doi.org/10.5753/dsw.2021.17422","url":null,"abstract":"Nowadays, our society suffers with a major issue that unfortunately is becoming more and more problematic, once again through social networks, that is the misinformation. The primary source of misinformation in Brazil is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few misinformation data sets built specifically from this platform. In this context, building a data set of WhatsApp messages about COVID-19 in Brazilian Portuguese and label misinformation messages within it becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled.","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116504068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariana O. Silva, Clarisse Scofield, Mirella M. Moro
Combining human expertise with book-consumers data may generate what is needed to sustain constant changes experienced in the book publishing market. Then, building and making available datasets that entirely comprise the essential elements of the book industry ecosystem is essential. However, little has been done in such a context for non-English languages, such as Portuguese. Hence, we introduce PPORTAL, a public domain Portuguese-language literature dataset composed of books-related metadata. After an overview of its building process and content, we discuss a brief exploratory data analysis to summarize its main characteristics. We also highlight potential applications, showing how PPORTAL is useful as a resource on different research domains.
{"title":"PPORTAL: Public Domain Portuguese-language Literature Dataset","authors":"Mariana O. Silva, Clarisse Scofield, Mirella M. Moro","doi":"10.5753/dsw.2021.17416","DOIUrl":"https://doi.org/10.5753/dsw.2021.17416","url":null,"abstract":"Combining human expertise with book-consumers data may generate what is needed to sustain constant changes experienced in the book publishing market. Then, building and making available datasets that entirely comprise the essential elements of the book industry ecosystem is essential. However, little has been done in such a context for non-English languages, such as Portuguese. Hence, we introduce PPORTAL, a public domain Portuguese-language literature dataset composed of books-related metadata. After an overview of its building process and content, we discuss a brief exploratory data analysis to summarize its main characteristics. We also highlight potential applications, showing how PPORTAL is useful as a resource on different research domains.","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124612789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel P. Oliveira, Gabriel R. G. Barbosa, Bruna C. Melo, Mariana O. Silva, Danilo B. Seufitelli, Mirella M. Moro
Music is an alive industry with an increasing volume of complex data that creates new challenges and opportunities for extracting knowledge, benefiting not only the different music segments but also the Music Information Retrieval (MIR) community. In this paper, we present MUHSIC, a novel dataset with enhanced information on musical success. We focus on artists and genres by combining chart-related data with acoustic metadata to describe the temporal evolution of musical careers. The enriched and curated data allow building success-based time series to investigate high-impact periods (hot streaks) in such careers, transforming complex data into knowledge. Overall, MUHSIC is a relevant tool in music-related tasks due to its easy use and replicability.
{"title":"MUHSIC: An Open Dataset with Temporal Musical Success Information","authors":"Gabriel P. Oliveira, Gabriel R. G. Barbosa, Bruna C. Melo, Mariana O. Silva, Danilo B. Seufitelli, Mirella M. Moro","doi":"10.5753/dsw.2021.17415","DOIUrl":"https://doi.org/10.5753/dsw.2021.17415","url":null,"abstract":"Music is an alive industry with an increasing volume of complex data that creates new challenges and opportunities for extracting knowledge, benefiting not only the different music segments but also the Music Information Retrieval (MIR) community. In this paper, we present MUHSIC, a novel dataset with enhanced information on musical success. We focus on artists and genres by combining chart-related data with acoustic metadata to describe the temporal evolution of musical careers. The enriched and curated data allow building success-based time series to investigate high-impact periods (hot streaks) in such careers, transforming complex data into knowledge. Overall, MUHSIC is a relevant tool in music-related tasks due to its easy use and replicability.","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Rocha, Henrique Maio, D. Menasché, Claudio Miceli
There is a growing necessity for insightful and meaningful analyticswithin eSports: be it to entertain spectators as they watch their favorite teamscompete, to automatically identify and catch cheaters or even to gain a com-petitive edge over an opponent, there is a plethora of potential applicationsfor analytics within the scene. It follows then, that there is also a necessityfor well structured and organized datasets that enable efficient data explorationand serve as the foundation for the visualization and analytics layers. Becauseof this, the entire process - from data collection at the source to the means ofaccessing the desired information - need to be planned out to address thoseneeds. Our work provides the means by which to construct such a dataset forthe Counter-Strike Global Offensive (CS:GO) game, thus opening up a range ofpossible applications on top of the data
{"title":"Extracting and Composing a Dataset of Competitive Counter-Strike Global Offensive Matches","authors":"E. Rocha, Henrique Maio, D. Menasché, Claudio Miceli","doi":"10.5753/dsw.2021.17412","DOIUrl":"https://doi.org/10.5753/dsw.2021.17412","url":null,"abstract":"There is a growing necessity for insightful and meaningful analyticswithin eSports: be it to entertain spectators as they watch their favorite teamscompete, to automatically identify and catch cheaters or even to gain a com-petitive edge over an opponent, there is a plethora of potential applicationsfor analytics within the scene. It follows then, that there is also a necessityfor well structured and organized datasets that enable efficient data explorationand serve as the foundation for the visualization and analytics layers. Becauseof this, the entire process - from data collection at the source to the means ofaccessing the desired information - need to be planned out to address thoseneeds. Our work provides the means by which to construct such a dataset forthe Counter-Strike Global Offensive (CS:GO) game, thus opening up a range ofpossible applications on top of the data","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132322931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cazzolato, L. C. Scabora, Guilherme F. Zabot, M. A. Gutierrez, Caetano Traina Jr., A. Traina
In this paper, we present FeatSet, a compilation of visual features extracted from open image datasets reported in the literature. FeatSet has a collection of 11 visual features, consisting of color, texture, and shape representations of the images acquired from 13 datasets. We organized the available features in a standard collection, including the available metadata and labels, when available. We also provide a description of the domain of each dataset included in our collection, with visual analysis using Multidimensional Scaling (MDS) and Principal Components Analysis (PCA) methods. FeatSet is recommended for supervised and non-supervised learning, also widely supporting Content-Based Image Retrieval (CBIR) applications and complex data indexing using Metric Access Methods (MAMs).
{"title":"FeatSet: A Compilation of Visual Features Extracted from Public Image Datasets","authors":"M. Cazzolato, L. C. Scabora, Guilherme F. Zabot, M. A. Gutierrez, Caetano Traina Jr., A. Traina","doi":"10.5753/dsw.2021.17417","DOIUrl":"https://doi.org/10.5753/dsw.2021.17417","url":null,"abstract":"In this paper, we present FeatSet, a compilation of visual features extracted from open image datasets reported in the literature. FeatSet has a collection of 11 visual features, consisting of color, texture, and shape representations of the images acquired from 13 datasets. We organized the available features in a standard collection, including the available metadata and labels, when available. We also provide a description of the domain of each dataset included in our collection, with visual analysis using Multidimensional Scaling (MDS) and Principal Components Analysis (PCA) methods. FeatSet is recommended for supervised and non-supervised learning, also widely supporting Content-Based Image Retrieval (CBIR) applications and complex data indexing using Metric Access Methods (MAMs).","PeriodicalId":314975,"journal":{"name":"Anais do III Dataset Showcase Workshop (DSW 2021)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}