首页 > 最新文献

Proceedings of the Brazilian Symposium on Multimedia and the Web最新文献

英文 中文
Multimodal intent classification with incomplete modalities using text embedding propagation 基于文本嵌入传播的不完全模态多模态意图分类
Pub Date : 2021-09-27 DOI: 10.1145/3470482.3479636
Victor Machado Gonzaga, Nils Murrugarra-Llerena, R. Marcacini
Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, while the text can be an ironic message about the image content. Similarly, a text can be a news headline, while the image represents a provocation, meme, or satire about the news. Existing approaches propose intent classification techniques combining both modalities. However, some posts may have missing textual annotations. Hence, we investigate a graph-based approach that propagates available text embedding data from complete multimodal posts to incomplete ones. This paper presents a text embedding propagation method, which transfers embeddings from BERT neural language models to image-only posts (i.e., posts with incomplete modality) considering the topology of a graph constructed from both visual and textual modalities available during the training step. By using this inference approach, our method provides competitive results when textual modality is available at different completeness levels, even compared to reference methods that require complete modalities.
在社交媒体帖子中确定作者的意图是一项具有挑战性的多模式任务,需要识别帖子中图像和文本之间的复杂关系。例如,帖子图像可以代表一个对象、人、产品或公司,而文本可以是关于图像内容的讽刺信息。同样,文本可以是新闻标题,而图片则代表对新闻的挑衅、梗或讽刺。现有的方法提出了结合这两种模式的意图分类技术。但是,有些帖子可能缺少文本注释。因此,我们研究了一种基于图的方法,将可用的文本嵌入数据从完整的多模式帖子传播到不完整的帖子。本文提出了一种文本嵌入传播方法,该方法将BERT神经语言模型的嵌入转移到仅图像的帖子(即具有不完整模态的帖子),考虑了在训练步骤中可用的视觉和文本模态构建的图的拓扑结构。通过使用这种推理方法,我们的方法在文本模态在不同的完整性级别可用时提供了竞争性的结果,甚至与需要完整模态的参考方法相比也是如此。
{"title":"Multimodal intent classification with incomplete modalities using text embedding propagation","authors":"Victor Machado Gonzaga, Nils Murrugarra-Llerena, R. Marcacini","doi":"10.1145/3470482.3479636","DOIUrl":"https://doi.org/10.1145/3470482.3479636","url":null,"abstract":"Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, while the text can be an ironic message about the image content. Similarly, a text can be a news headline, while the image represents a provocation, meme, or satire about the news. Existing approaches propose intent classification techniques combining both modalities. However, some posts may have missing textual annotations. Hence, we investigate a graph-based approach that propagates available text embedding data from complete multimodal posts to incomplete ones. This paper presents a text embedding propagation method, which transfers embeddings from BERT neural language models to image-only posts (i.e., posts with incomplete modality) considering the topology of a graph constructed from both visual and textual modalities available during the training step. By using this inference approach, our method provides competitive results when textual modality is available at different completeness levels, even compared to reference methods that require complete modalities.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Textual Representations from Multiple Modalities to Detect Fake News Through One-Class Learning 通过一堂课学习,学习多种形式的文本表示来检测假新闻
Pub Date : 2021-09-27 DOI: 10.1145/3470482.3479634
M. Gôlo, M. C. D. Souza, R. G. Rossi, S. O. Rezende, B. Nogueira, R. Marcacini
Fake news can rapidly spread through internet users. Approaches proposed in the literature for content classification usually learn models considering textual and contextual features from real and fake news to minimize the spread of disinformation. One of the prominent approaches to detect fake news is One-Class Learning (OCL), as it minimizes the data labeling effort, requiring only the labeling of fake news documents. The performance of these algorithms depends on the structured representation of the documents used in the learning process. Generally, a textual-based unimodal representation is used, such as bag-of-words or representations based on linguistic categories. We propose MVAE-FakeNews, a multimodal representation method to detect fake news in OCL. The proposed approach uses a Multimodal Variational Autoencoder, learns a new representation from the combination of two modalities considered promising for fake news detection: text embeddings and topic information. In the experiments, we used three datasets considering Portuguese and English languages. Results show that the MVAE-FakeNews obtained a better F1-Score for the class of interest, outperforming another nine methods in ten of twelve evaluated scenarios. MVAE-FakeNews presented a better average ranking and statistical difference from other representation models. The proposed method proved to be promising to represent the texts in the OCL scenario to detect fake news.
假新闻可以通过互联网用户迅速传播。文献中提出的内容分类方法通常学习考虑真假新闻文本和上下文特征的模型,以最大限度地减少虚假信息的传播。检测假新闻的主要方法之一是单类学习(OCL),因为它最大限度地减少了数据标记工作,只需要标记假新闻文档。这些算法的性能取决于学习过程中使用的文档的结构化表示。一般使用基于文本的单模态表示,如词袋表示或基于语言类别的表示。我们提出了一种多模态表示方法mvee - fakenews来检测OCL中的假新闻。提出的方法使用多模态变分自编码器,从文本嵌入和主题信息这两种被认为有希望用于假新闻检测的模态组合中学习新的表示。在实验中,我们使用了三个考虑葡萄牙语和英语语言的数据集。结果表明,MVAE-FakeNews在感兴趣的类别中获得了更好的f1分,在12个评估场景中的10个中优于其他9个方法。与其他表征模型相比,MVAE-FakeNews表现出更好的平均排名和统计差异。所提出的方法被证明有希望表示OCL场景中的文本来检测假新闻。
{"title":"Learning Textual Representations from Multiple Modalities to Detect Fake News Through One-Class Learning","authors":"M. Gôlo, M. C. D. Souza, R. G. Rossi, S. O. Rezende, B. Nogueira, R. Marcacini","doi":"10.1145/3470482.3479634","DOIUrl":"https://doi.org/10.1145/3470482.3479634","url":null,"abstract":"Fake news can rapidly spread through internet users. Approaches proposed in the literature for content classification usually learn models considering textual and contextual features from real and fake news to minimize the spread of disinformation. One of the prominent approaches to detect fake news is One-Class Learning (OCL), as it minimizes the data labeling effort, requiring only the labeling of fake news documents. The performance of these algorithms depends on the structured representation of the documents used in the learning process. Generally, a textual-based unimodal representation is used, such as bag-of-words or representations based on linguistic categories. We propose MVAE-FakeNews, a multimodal representation method to detect fake news in OCL. The proposed approach uses a Multimodal Variational Autoencoder, learns a new representation from the combination of two modalities considered promising for fake news detection: text embeddings and topic information. In the experiments, we used three datasets considering Portuguese and English languages. Results show that the MVAE-FakeNews obtained a better F1-Score for the class of interest, outperforming another nine methods in ten of twelve evaluated scenarios. MVAE-FakeNews presented a better average ranking and statistical difference from other representation models. The proposed method proved to be promising to represent the texts in the OCL scenario to detect fake news.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121905287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Towards Understanding the Use of Telegram by Political Groups in Brazil 了解巴西政治团体对电报的使用
Pub Date : 2021-09-27 DOI: 10.1145/3470482.3479640
Manoel Júnior, P. Melo, Ana P. C. Silva, Fabrício Benevenuto, J. Almeida
Instant messaging platforms such as Telegram and WhatsApp became one of the main means of communication used by people all over the world. In most of these services, communities are created around the so called groups and channels, allowing easy, encrypted and instantaneous information exchange. With the political debate gaining a widespread attention from the public and permeated with intense discussion and polarization, specially in a context in which far right communities are being banned from maistream social networks like Twitter, Youtube, and Facebook, alternative platforms, like Telegram become very popular as they start to be seeked as a "free space to discussion" and abused for dissemination of misinformation and hate speech. This work consists in a data analysis for Brazilian public groups and channels for political discussion on Telegram, observing the network created in the platform as well as a closer look in the dynamics of messages and members in this platform. Our findings show that political mobilization increased substantially on Telegram in recent years, suggesting a mass migration from other mainstream platforms. We find the large groups structure of Telegram are effective in spreading the messages through the network, with the content being viewed by numerous users and forwarded multiple times. Looking at the messages, we find an expressive interplay between Telegram and external web pages, notably for Youtube and other social networks. Furthermore, we observed a relevant amount of messages attacking political personalities and spreading unchecked content about COVID-19 pandemic. Taken all together, we perform an extense study in how political discussion advanced on Telegram in Brazil and how they operate in this alternative messaging application.
Telegram和WhatsApp等即时通讯平台成为全世界人们使用的主要通信手段之一。在大多数这些服务中,社区是围绕所谓的组和通道创建的,允许轻松、加密和即时的信息交换。随着政治辩论获得公众的广泛关注,并充斥着激烈的讨论和两极分化,特别是在极右翼社区被禁止进入Twitter、Youtube和Facebook等主流社交网络的背景下,Telegram等替代平台变得非常受欢迎,因为它们开始被视为“自由讨论空间”,并被滥用于传播错误信息和仇恨言论。这项工作包括对巴西公共团体和Telegram上政治讨论频道的数据分析,观察平台上创建的网络,以及更仔细地观察该平台上信息和成员的动态。我们的研究结果表明,近年来,Telegram上的政治动员大幅增加,这表明从其他主流平台的大规模迁移。我们发现Telegram的大群组结构在通过网络传播消息方面是有效的,内容被许多用户查看并多次转发。通过查看这些信息,我们发现Telegram与外部网页之间存在着富有表现力的相互作用,尤其是Youtube和其他社交网络。此外,我们观察到相当数量的攻击政治人物的信息,并传播有关COVID-19大流行的未经检查的内容。综上所述,我们对巴西Telegram上的政治讨论进行了广泛的研究,以及它们如何在这个替代消息传递应用程序中运行。
{"title":"Towards Understanding the Use of Telegram by Political Groups in Brazil","authors":"Manoel Júnior, P. Melo, Ana P. C. Silva, Fabrício Benevenuto, J. Almeida","doi":"10.1145/3470482.3479640","DOIUrl":"https://doi.org/10.1145/3470482.3479640","url":null,"abstract":"Instant messaging platforms such as Telegram and WhatsApp became one of the main means of communication used by people all over the world. In most of these services, communities are created around the so called groups and channels, allowing easy, encrypted and instantaneous information exchange. With the political debate gaining a widespread attention from the public and permeated with intense discussion and polarization, specially in a context in which far right communities are being banned from maistream social networks like Twitter, Youtube, and Facebook, alternative platforms, like Telegram become very popular as they start to be seeked as a \"free space to discussion\" and abused for dissemination of misinformation and hate speech. This work consists in a data analysis for Brazilian public groups and channels for political discussion on Telegram, observing the network created in the platform as well as a closer look in the dynamics of messages and members in this platform. Our findings show that political mobilization increased substantially on Telegram in recent years, suggesting a mass migration from other mainstream platforms. We find the large groups structure of Telegram are effective in spreading the messages through the network, with the content being viewed by numerous users and forwarded multiple times. Looking at the messages, we find an expressive interplay between Telegram and external web pages, notably for Youtube and other social networks. Furthermore, we observed a relevant amount of messages attacking political personalities and spreading unchecked content about COVID-19 pandemic. Taken all together, we perform an extense study in how political discussion advanced on Telegram in Brazil and how they operate in this alternative messaging application.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127881755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Identifying Criminal Suspects on Social Networks: A Vocabulary-Based Method 基于词汇的社交网络犯罪嫌疑人识别方法
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3431091
Érick S. Florentino, R. Goldschmidt, M. C. Cavalcanti
Identifying suspects of crimes on social networks is one of the most relevant tasks in the analysis of this type of network. Most of the computational methods focused on this task involve supervised machine learning, and, therefore, require previously labeled datasets that inform, among the registered people, messages and/or conversations, which ones are suspects. However, in practice, this type of information is not available, for several reasons, among which, it is rare or even protected by secrecy guaranteed by law. This limitation makes it very difficult to effectively use these methods in real situations. Hence, the present work raises the hypothesis that the use of a controlled vocabulary on the field of application can make it possible the identification of suspects in social networks, without the need for previously labeled datasets. In order to search for experimental evidence that points to the validity of the hypothesis raised, this article proposes a generic method that uses a controlled vocabulary with categorized terms, according to a certain domain (e.g., pedophilia, cyberbullying, terrorism, etc.), to analyze messages exchanged on social networks, in order to identify criminal suspects. The results obtained in a preliminary experiment in pedophilia domain showed signs of adequacy of the proposed method.
识别社交网络上的犯罪嫌疑人是分析这类网络中最相关的任务之一。大多数专注于这项任务的计算方法都涉及监督机器学习,因此,需要预先标记的数据集来告知,在注册的人,消息和/或对话中,哪些是可疑的。然而,在实践中,这类信息是无法获得的,原因有几个,其中,它是罕见的,甚至受到法律保障的保密保护。这种限制使得在实际情况下很难有效地使用这些方法。因此,本研究提出了一个假设,即在应用领域使用受控词汇可以使识别社交网络中的嫌疑人成为可能,而不需要先前标记的数据集。为了寻找实验证据来证明所提假设的有效性,本文提出了一种通用方法,即根据特定的领域(如恋童癖、网络欺凌、恐怖主义等),使用受控词汇和分类词汇来分析社交网络上的信息交流,从而识别犯罪嫌疑人。在恋童癖领域的初步实验中获得的结果表明,所提出的方法是适当的。
{"title":"Identifying Criminal Suspects on Social Networks: A Vocabulary-Based Method","authors":"Érick S. Florentino, R. Goldschmidt, M. C. Cavalcanti","doi":"10.1145/3428658.3431091","DOIUrl":"https://doi.org/10.1145/3428658.3431091","url":null,"abstract":"Identifying suspects of crimes on social networks is one of the most relevant tasks in the analysis of this type of network. Most of the computational methods focused on this task involve supervised machine learning, and, therefore, require previously labeled datasets that inform, among the registered people, messages and/or conversations, which ones are suspects. However, in practice, this type of information is not available, for several reasons, among which, it is rare or even protected by secrecy guaranteed by law. This limitation makes it very difficult to effectively use these methods in real situations. Hence, the present work raises the hypothesis that the use of a controlled vocabulary on the field of application can make it possible the identification of suspects in social networks, without the need for previously labeled datasets. In order to search for experimental evidence that points to the validity of the hypothesis raised, this article proposes a generic method that uses a controlled vocabulary with categorized terms, according to a certain domain (e.g., pedophilia, cyberbullying, terrorism, etc.), to analyze messages exchanged on social networks, in order to identify criminal suspects. The results obtained in a preliminary experiment in pedophilia domain showed signs of adequacy of the proposed method.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121822902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating Early Fusion Operators at Mid-Level Feature Space 中期特征空间早期融合算子评估
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3431079
Antonio A. R. Beserra, R. M. Kishi, R. Goularte
Early fusion techniques have been proposed in video analysis tasks as a way to improve efficacy by generating compact data models capable of keeping semantic clues present on multimodal data. First attempts to fuse multimodal data employed fusion operators at low-level feature space, losing data representativeness. This drove later research efforts to evolve simple operators to complex operations, which became, in general, inseparable of the multimodal semantic clues processing. In this paper, we investigate the application of early multimodal fusion operators at the mid-level feature space. Five different operators (Concatenation, Sum, Gram, Average and Maximum) were employed to fuse mid-level multimodal video features. Fused data derived from each operator were then used as input for two different video analysis tasks: Temporal Video Scene Segmentation and Video Classification. For each task, we performed a comparative analysis among the operators and related work techniques designed for these tasks using complex fusion operations. The efficacy results reached by the operators were very close to those reached by the techniques, pointing out strong evidence that working on a more homogeneous feature space can reduce known low-level fusion drawbacks. In addition, operators make data fusion separable, allowing researchers to keep the focus on developing semantic clues representations.
早期的融合技术已经在视频分析任务中被提出,作为一种通过生成能够在多模态数据上保持语义线索的紧凑数据模型来提高效率的方法。第一次尝试融合多模态数据时,在底层特征空间使用融合算子,失去了数据的代表性。这促使后来的研究努力将简单的操作演变为复杂的操作,这在一般情况下与多模态语义线索处理是分不开的。本文研究了早期多模态融合算子在中级特征空间中的应用。采用五种不同的算子(concatation, Sum, Gram, Average和Maximum)融合中级多模态视频特征。然后,将每个算子的融合数据作为两个不同视频分析任务的输入:时间视频场景分割和视频分类。对于每个任务,我们使用复杂的融合操作对操作员和为这些任务设计的相关工作技术进行了比较分析。操作人员达到的效果结果非常接近技术所达到的效果,指出了强有力的证据,表明在更均匀的特征空间上工作可以减少已知的低水平融合缺陷。此外,操作员使数据融合可分离,使研究人员能够专注于开发语义线索表示。
{"title":"Evaluating Early Fusion Operators at Mid-Level Feature Space","authors":"Antonio A. R. Beserra, R. M. Kishi, R. Goularte","doi":"10.1145/3428658.3431079","DOIUrl":"https://doi.org/10.1145/3428658.3431079","url":null,"abstract":"Early fusion techniques have been proposed in video analysis tasks as a way to improve efficacy by generating compact data models capable of keeping semantic clues present on multimodal data. First attempts to fuse multimodal data employed fusion operators at low-level feature space, losing data representativeness. This drove later research efforts to evolve simple operators to complex operations, which became, in general, inseparable of the multimodal semantic clues processing. In this paper, we investigate the application of early multimodal fusion operators at the mid-level feature space. Five different operators (Concatenation, Sum, Gram, Average and Maximum) were employed to fuse mid-level multimodal video features. Fused data derived from each operator were then used as input for two different video analysis tasks: Temporal Video Scene Segmentation and Video Classification. For each task, we performed a comparative analysis among the operators and related work techniques designed for these tasks using complex fusion operations. The efficacy results reached by the operators were very close to those reached by the techniques, pointing out strong evidence that working on a more homogeneous feature space can reduce known low-level fusion drawbacks. In addition, operators make data fusion separable, allowing researchers to keep the focus on developing semantic clues representations.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122127260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Data Augmentation and Neural Networks to Improve the Emotion Analysis of Brazilian Portuguese Texts 使用数据增强和神经网络改进巴西葡萄牙语文本的情感分析
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3431080
Vinícius Veríssimo, Rostand E. O. Costa
Information and Communication Technologies present as an interesting alternative for the mitigation of barriers that arise in the context of communication of information, mainly as technologies aimed at the machine translation of content in oral language into sign language. After years, despite the improvement of these technologies, the use of them still divides the opinions of the Deaf Community, due to the low emotional expressiveness of 3D avatars. Therefore, as a way to assist the machine translation of texts in oral language to sign language, this study aims to evaluate the influence of the parameters of a data augmentation method in a textual dataset and the use of neural networks for emotion analysis of Bazilian Portuguese texts. The analysis of emotions in texts presents a relevant challenge in diversity due to the nuances and different forms of expression that the human language uses. In this context, the use of deep neural networks has gained enough space as a way to deal with these challenges, mainly with the use of algorithms that deal with emotion analysis as a textual classification task, such as the MultiFiT approach. To circumvent the scarcity of data in Brazilian Portuguese aimed at this task, some strategies for increasing data were evaluated and applied to improve the database used in training. The results of the emotion analysis experiments with Transfer Learning pointed to accuracy above 94% in the best case.
信息和通信技术是缓解信息交流中出现的障碍的一个有趣的替代方案,主要是旨在将口头语言内容机器翻译为手语的技术。多年后,尽管这些技术得到了改进,但由于3D化身的情感表现力较低,使用它们仍然在聋人群体中存在分歧。因此,作为辅助口语文本到手语文本机器翻译的一种方式,本研究旨在评估文本数据集中数据增强方法参数的影响,并利用神经网络对巴联葡萄牙语文本进行情感分析。由于人类语言使用的细微差别和不同的表达形式,文本中的情感分析在多样性方面提出了相关的挑战。在这种情况下,深度神经网络的使用已经获得了足够的空间来处理这些挑战,主要是使用将情感分析作为文本分类任务处理的算法,例如MultiFiT方法。为了避免巴西葡萄牙语中针对这一任务的数据稀缺,我们评估了一些增加数据的策略,并应用于改进培训中使用的数据库。迁移学习的情绪分析实验结果表明,在最好的情况下,准确率超过94%。
{"title":"Using Data Augmentation and Neural Networks to Improve the Emotion Analysis of Brazilian Portuguese Texts","authors":"Vinícius Veríssimo, Rostand E. O. Costa","doi":"10.1145/3428658.3431080","DOIUrl":"https://doi.org/10.1145/3428658.3431080","url":null,"abstract":"Information and Communication Technologies present as an interesting alternative for the mitigation of barriers that arise in the context of communication of information, mainly as technologies aimed at the machine translation of content in oral language into sign language. After years, despite the improvement of these technologies, the use of them still divides the opinions of the Deaf Community, due to the low emotional expressiveness of 3D avatars. Therefore, as a way to assist the machine translation of texts in oral language to sign language, this study aims to evaluate the influence of the parameters of a data augmentation method in a textual dataset and the use of neural networks for emotion analysis of Bazilian Portuguese texts. The analysis of emotions in texts presents a relevant challenge in diversity due to the nuances and different forms of expression that the human language uses. In this context, the use of deep neural networks has gained enough space as a way to deal with these challenges, mainly with the use of algorithms that deal with emotion analysis as a textual classification task, such as the MultiFiT approach. To circumvent the scarcity of data in Brazilian Portuguese aimed at this task, some strategies for increasing data were evaluated and applied to improve the database used in training. The results of the emotion analysis experiments with Transfer Learning pointed to accuracy above 94% in the best case.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117271297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Approach for Improving DBpedia as a Research Data Hub 一个改进DBpedia作为研究数据中心的方法
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3431075
Jean Gabriel Nguema Ngomo, G. R. Lopes, M. Campos, M. C. Cavalcanti
Extracted from Wikipedia content, DBpedia is considered one of the most important knowledge bases of the Semantic Web, which has editions in several languages, among which those in English (DBpedia EN) and Portuguese (DBpedia PT). All DBpedia editions are subject to quality issues, more especially DBpedia PT suffers from inconsistencies and lack of data in several domains. This paper describes a semi-automatic and incremental process for publishing data on DBpedia, coming from reliable external sources, while seeking to improve aspects of its quality. In an open science context, the proposal aims at consolidating DBpedia as a reference hub for research data, so that research from any area supported by the Semantic Web data can use its data reliably. Although the approach is independent from a specific DBpedia edition, the supporting prototype tool, named ETL4DBpedia, was built for DBpedia PT, based on ETL workflows (Extract, Transform, Load). This paper also describes the assessment of the approach, applying the tool in a real-usage scenario involving data from the field of botany. This application resulted in an increase by 127% in the completeness of species of medicinal plants in DBpedia PT, besides showing satisfactory performance for ETL4Bpedia components.
从维基百科内容中提取的DBpedia被认为是语义网最重要的知识库之一,它有几种语言的版本,其中包括英语(DBpedia EN)和葡萄牙语(DBpedia PT)。所有DBpedia版本都存在质量问题,特别是DBpedia PT在几个领域存在不一致和数据缺乏的问题。本文描述了在DBpedia上发布数据的半自动和增量过程,这些数据来自可靠的外部来源,同时寻求改进其质量方面。在开放的科学背景下,该提案旨在将DBpedia整合为研究数据的参考中心,以便语义网数据支持的任何领域的研究都可以可靠地使用其数据。尽管该方法独立于特定的DBpedia版本,但是支持的原型工具,名为ETL4DBpedia,是为DBpedia PT构建的,基于ETL工作流(提取、转换、加载)。本文还描述了该方法的评估,将该工具应用于涉及植物学领域数据的实际使用场景。该应用程序使DBpedia PT中药用植物种类的完整性提高了127%,并且对ETL4Bpedia组件显示了令人满意的性能。
{"title":"An Approach for Improving DBpedia as a Research Data Hub","authors":"Jean Gabriel Nguema Ngomo, G. R. Lopes, M. Campos, M. C. Cavalcanti","doi":"10.1145/3428658.3431075","DOIUrl":"https://doi.org/10.1145/3428658.3431075","url":null,"abstract":"Extracted from Wikipedia content, DBpedia is considered one of the most important knowledge bases of the Semantic Web, which has editions in several languages, among which those in English (DBpedia EN) and Portuguese (DBpedia PT). All DBpedia editions are subject to quality issues, more especially DBpedia PT suffers from inconsistencies and lack of data in several domains. This paper describes a semi-automatic and incremental process for publishing data on DBpedia, coming from reliable external sources, while seeking to improve aspects of its quality. In an open science context, the proposal aims at consolidating DBpedia as a reference hub for research data, so that research from any area supported by the Semantic Web data can use its data reliably. Although the approach is independent from a specific DBpedia edition, the supporting prototype tool, named ETL4DBpedia, was built for DBpedia PT, based on ETL workflows (Extract, Transform, Load). This paper also describes the assessment of the approach, applying the tool in a real-usage scenario involving data from the field of botany. This application resulted in an increase by 127% in the completeness of species of medicinal plants in DBpedia PT, besides showing satisfactory performance for ETL4Bpedia components.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114221808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Ontology-based Information Model for Multi-Domain Semantic Modeling and Analysis of Smart City Data 基于本体的智慧城市数据多领域语义建模与分析信息模型
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3430973
B. Rocha, Larysse Silva, T. Batista, Everton Cavalcante, Porfírio Gomes
Smart city services are typically defined according to domains (e.g., health, education, safety) and supported by different systems. Consequently, the analysis of smart city data is often domain-specific, thus limiting the capabilities of the offered services and hampering decision-making that relies on isolated domain information. To support a suitable analysis across multiple domains, it is necessary having a unified data model able to handle the inherent heterogeneity of smart city data and take into account both geographic and citizen information. This paper presents an ontology-based information model to support multi-domain analysis in smart cities to foster interoperability and powerful automated reasoning upon unambiguous information. The proposed information model follows Linked Data principles and takes advantage of ontologies to define information semantically. The semantic relationships and properties defined in the model also allow inferring new pieces of information that improve accuracy when analyzing multiple city domains. This paper reports an evaluation of the information model through ontological metrics and competence questions.
智慧城市服务通常根据领域(例如,健康、教育、安全)进行定义,并由不同的系统提供支持。因此,对智慧城市数据的分析通常是特定于领域的,从而限制了所提供服务的能力,并阻碍了依赖于孤立领域信息的决策。为了支持跨多个领域的适当分析,有必要拥有一个统一的数据模型,能够处理智慧城市数据的固有异质性,并考虑地理和公民信息。本文提出了一种基于本体的信息模型,支持智慧城市中的多领域分析,以促进互操作性和对明确信息的强大自动推理。建议的信息模型遵循关联数据原则,并利用本体在语义上定义信息。模型中定义的语义关系和属性还允许推断新的信息片段,从而在分析多个城市域时提高准确性。本文报告了通过本体度量和能力问题对信息模型的评估。
{"title":"An Ontology-based Information Model for Multi-Domain Semantic Modeling and Analysis of Smart City Data","authors":"B. Rocha, Larysse Silva, T. Batista, Everton Cavalcante, Porfírio Gomes","doi":"10.1145/3428658.3430973","DOIUrl":"https://doi.org/10.1145/3428658.3430973","url":null,"abstract":"Smart city services are typically defined according to domains (e.g., health, education, safety) and supported by different systems. Consequently, the analysis of smart city data is often domain-specific, thus limiting the capabilities of the offered services and hampering decision-making that relies on isolated domain information. To support a suitable analysis across multiple domains, it is necessary having a unified data model able to handle the inherent heterogeneity of smart city data and take into account both geographic and citizen information. This paper presents an ontology-based information model to support multi-domain analysis in smart cities to foster interoperability and powerful automated reasoning upon unambiguous information. The proposed information model follows Linked Data principles and takes advantage of ontologies to define information semantically. The semantic relationships and properties defined in the model also allow inferring new pieces of information that improve accuracy when analyzing multiple city domains. This paper reports an evaluation of the information model through ontological metrics and competence questions.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134262183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Analyzing the Use of COVID-19 Ads on Facebook 分析Facebook上COVID-19广告的使用情况
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3431088
Márcio Silva, Fabrício Benevenuto
In view of the emergence of mobility restrictions and social isolation imposed by the coronavirus or COVID-19 pandemic, digital media, especially social networks, become a breeding ground for fake news, political attacks and large-scale misinformation. The impacts of this 'infodemic' can take even greater proportions when using sponsored content on social networks, such as Facebook ads. Using the Facebook ad library we collected more than 236k facebook ads from 75 different countries. Choosing ads from Brazil as the focus of research, we found ads with political attacks, requests for donations, doctors prescribing vitamin D as a weapon to fight coronavirus, among other contents with evidence of misinformation.
鉴于冠状病毒或COVID-19大流行造成的流动限制和社会隔离,数字媒体,特别是社交网络,成为假新闻、政治攻击和大规模错误信息的滋生地。当在社交网络上使用赞助内容(如Facebook广告)时,这种“信息流行病”的影响可能会更大。使用Facebook广告库,我们收集了来自75个不同国家的超过23.6万个Facebook广告。我们选择巴西的广告作为研究重点,发现广告中有政治攻击、要求捐款、医生处方维生素D作为对抗冠状病毒的武器,以及其他有错误信息证据的内容。
{"title":"Analyzing the Use of COVID-19 Ads on Facebook","authors":"Márcio Silva, Fabrício Benevenuto","doi":"10.1145/3428658.3431088","DOIUrl":"https://doi.org/10.1145/3428658.3431088","url":null,"abstract":"In view of the emergence of mobility restrictions and social isolation imposed by the coronavirus or COVID-19 pandemic, digital media, especially social networks, become a breeding ground for fake news, political attacks and large-scale misinformation. The impacts of this 'infodemic' can take even greater proportions when using sponsored content on social networks, such as Facebook ads. Using the Facebook ad library we collected more than 236k facebook ads from 75 different countries. Choosing ads from Brazil as the focus of research, we found ads with political attacks, requests for donations, doctors prescribing vitamin D as a weapon to fight coronavirus, among other contents with evidence of misinformation.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131270701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Collaborative Filtering Strategy for Product Recommendation Using Personality Characteristics of Customers 基于顾客个性特征的产品推荐协同过滤策略
Pub Date : 2020-11-30 DOI: 10.1145/3428658.3430969
J. J. B. Aguiar, J. Fechine, E. Costa
Research indicates that people can receive more useful product recommendations if the filtering process considers their personality. In this paper, we propose a hybrid strategy for Recommender Systems (using matrix factorization and personality-based neighborhood) to recommend the best products calculated for a particular customer (user). The proposed user profile used in the definition of the neighborhood involves these three personality models: Big Five (or OCEAN, or Five-Factor Model), Needs, and Values. We experimented with data from more than 10,000 Amazon customers. We inferred their personality characteristics from the analysis of reviews via IBM Watson Personality Insights. The results indicated that the proposed strategy's performance was better than that of the state-of-the-art algorithms analyzed. Besides, there was no statistical difference between using only the Big Five model or using it together with the Needs and Values models.
研究表明,如果过滤过程考虑到人们的个性,人们可以收到更多有用的产品推荐。在本文中,我们提出了一种推荐系统的混合策略(使用矩阵分解和基于个性的邻域)来推荐为特定客户(用户)计算的最佳产品。在社区定义中使用的建议用户配置文件涉及这三个人格模型:Big Five(或OCEAN,或Five- factor Model)、Needs和Values。我们用来自1万多名亚马逊客户的数据进行了实验。我们通过IBM Watson personality Insights对评论进行分析,推断出他们的性格特征。结果表明,该策略的性能优于所分析的最先进算法。此外,仅使用大五模型或将其与需求和价值模型一起使用之间没有统计学差异。
{"title":"Collaborative Filtering Strategy for Product Recommendation Using Personality Characteristics of Customers","authors":"J. J. B. Aguiar, J. Fechine, E. Costa","doi":"10.1145/3428658.3430969","DOIUrl":"https://doi.org/10.1145/3428658.3430969","url":null,"abstract":"Research indicates that people can receive more useful product recommendations if the filtering process considers their personality. In this paper, we propose a hybrid strategy for Recommender Systems (using matrix factorization and personality-based neighborhood) to recommend the best products calculated for a particular customer (user). The proposed user profile used in the definition of the neighborhood involves these three personality models: Big Five (or OCEAN, or Five-Factor Model), Needs, and Values. We experimented with data from more than 10,000 Amazon customers. We inferred their personality characteristics from the analysis of reviews via IBM Watson Personality Insights. The results indicated that the proposed strategy's performance was better than that of the state-of-the-art algorithms analyzed. Besides, there was no statistical difference between using only the Big Five model or using it together with the Needs and Values models.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116288118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the Brazilian Symposium on Multimedia and the Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1