首页 > 最新文献

Companion Proceedings of the Web Conference 2021最新文献

英文 中文
Progressive Semantic Reasoning for Image Inpainting 图像绘制的递进语义推理
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451142
J. Jin, Xinrong Hu, Kai He, Tao Peng, Junping Liu, Jie Yang
Image inpainting aims to reconstruct the missing or unknown region for a given image. As one of the most important topics from image processing, this task has attracted increasing research interest over the past few decades. Learning-based methods have been employed to solve this task, and achieved superior performance. Nevertheless, existing methods often produce artificial traces, due to the lack of constraints on image characterization under different semantics. To accommodate this issue, we propose a novel artistic Progressive Semantic Reasoning (PSR) network in this paper, which is composed of three shared parameters from the generation network superposition. More precisely, the proposed PSR algorithm follows a typical end-to-end training procedure, that learns low-level semantic features and further transfers them to a high-level semantic network for inpainting purposes. Furthermore, a simple but effective Cross Feature Reconstruction (CFR) strategy is proposed to tradeoff semantic information from different levels. Empirically, the proposed approach is evaluated via intensive experiments using a variety of real-world datasets. The results confirm the effectiveness of our algorithm compared with other state-of-the-art methods. The source code can be found from https://github.com/sfwyly/PSR-Net.
图像修复的目的是重建给定图像的缺失或未知区域。作为图像处理领域最重要的课题之一,该任务在过去几十年中引起了越来越多的研究兴趣。采用基于学习的方法来解决这一问题,并取得了较好的效果。然而,由于缺乏对不同语义下图像表征的约束,现有的方法往往会产生人工痕迹。为了解决这一问题,本文提出了一种新的艺术渐进式语义推理(PSR)网络,该网络由三个来自生成网络叠加的共享参数组成。更准确地说,提出的PSR算法遵循一个典型的端到端训练过程,该过程学习低级语义特征,并进一步将其转移到高级语义网络以用于绘制目的。在此基础上,提出了一种简单有效的交叉特征重构策略来权衡不同层次的语义信息。在经验上,通过使用各种真实世界数据集的密集实验来评估所提出的方法。结果证实了该算法与其他先进方法的有效性。源代码可以从https://github.com/sfwyly/PSR-Net找到。
{"title":"Progressive Semantic Reasoning for Image Inpainting","authors":"J. Jin, Xinrong Hu, Kai He, Tao Peng, Junping Liu, Jie Yang","doi":"10.1145/3442442.3451142","DOIUrl":"https://doi.org/10.1145/3442442.3451142","url":null,"abstract":"Image inpainting aims to reconstruct the missing or unknown region for a given image. As one of the most important topics from image processing, this task has attracted increasing research interest over the past few decades. Learning-based methods have been employed to solve this task, and achieved superior performance. Nevertheless, existing methods often produce artificial traces, due to the lack of constraints on image characterization under different semantics. To accommodate this issue, we propose a novel artistic Progressive Semantic Reasoning (PSR) network in this paper, which is composed of three shared parameters from the generation network superposition. More precisely, the proposed PSR algorithm follows a typical end-to-end training procedure, that learns low-level semantic features and further transfers them to a high-level semantic network for inpainting purposes. Furthermore, a simple but effective Cross Feature Reconstruction (CFR) strategy is proposed to tradeoff semantic information from different levels. Empirically, the proposed approach is evaluated via intensive experiments using a variety of real-world datasets. The results confirm the effectiveness of our algorithm compared with other state-of-the-art methods. The source code can be found from https://github.com/sfwyly/PSR-Net.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
How to See Smells: Extracting Olfactory References from Artworks 如何看到气味:从艺术品中提取嗅觉参考
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453710
Mathias Zinnen
1 PROBLEM Although being an essential part of how we experience the world, smell is severely undervalued in the context of cultural heritage. The Odeuropa project aims at preserving and recreating the olfactory heritage of Europe. State-of-the-art methods of artificial intelligence are applied to large corpora of visual and textual data ranging from the 16th to 20th century of European history to extract olfactory references. Creating an ontology of smells, this information is stored in the “European Olfactory Knowledge Graph (EOKG)” following standards of the semantic web. My Ph.D. addresses the visual extraction part of the project. We will create a taxonomy of visual smell references and acquire a large corpus of artworks from various early modern European digital collections. Using computer vision techniques, we will implement a pipeline for the combined recognition of olfactory objects, poses, and iconographies and annotate the images from our image corpus accordingly. Following these steps, we will address the following research questions: (i)What visual representations of smell exist in European 16th to 20th century works of art and how can these be represented in the EOKG as an ontology shared with the other work packages of the Odeuropa project? (ii)Whichmachine-learning techniques exist for the automated extraction of olfactory references in the visual arts? Particularly, which techniques are suited to cope with the domain shift problem when applying computer vision techniques to our field of research? (iii) How do the identified techniques perform in terms of established evaluation metrics? Which ones work best for the extraction of olfactory references? Both the preservation of olfactory heritage [3] and the application of machine learning (ML) to cultural heritage [1] have been addressed before. However, in most cases machine learning algorithms are treated as “black boxes” and their application does not contribute back to ML [4]. Computer vision techniques like object detection and pose estimation have successfully been applied to the domain of visual arts ([8], [2]) but have not achieved performance comparable to their application in the photographic domain. One reason for the success of computer vision on photographs is the availability of huge labeled datasets like ImageNet [10]. Datasets containing artworks
虽然气味是我们体验世界的重要组成部分,但在文化遗产的背景下,它的价值被严重低估了。Odeuropa项目旨在保护和重建欧洲的嗅觉遗产。最先进的人工智能方法被应用于从16世纪到20世纪的欧洲历史上的视觉和文本数据的大型语料库中,以提取嗅觉参考。创建气味本体,这些信息按照语义网的标准存储在“欧洲嗅觉知识图谱(EOKG)”中。我的博士研究的是这个项目的视觉提取部分。我们将创建一个视觉气味参考的分类,并从各种早期现代欧洲数字收藏中获得大量艺术品。使用计算机视觉技术,我们将实现嗅觉物体、姿势和图像的组合识别管道,并相应地对图像语料库中的图像进行注释。按照这些步骤,我们将解决以下研究问题:(i)在欧洲16至20世纪的艺术作品中存在哪些气味的视觉表现形式,以及如何在EOKG中将这些视觉表现形式作为与odeeuropa项目的其他工作包共享的本体?(ii)哪些机器学习技术可以自动提取视觉艺术中的嗅觉参考?特别是,当计算机视觉技术应用于我们的研究领域时,哪些技术适合处理域移位问题?(iii)根据既定的评价标准,确定的技术如何执行?哪一种方法最适合提取嗅觉参考?嗅觉遗产的保护[3]和机器学习(ML)在文化遗产中的应用[1]之前都有过讨论。然而,在大多数情况下,机器学习算法被视为“黑盒子”,它们的应用对ML没有贡献[4]。像物体检测和姿态估计这样的计算机视觉技术已经成功地应用于视觉艺术领域([8],[2]),但还没有达到与它们在摄影领域的应用相媲美的性能。计算机视觉在照片上取得成功的一个原因是大量标记数据集的可用性,如ImageNet[10]。包含艺术品的数据集
{"title":"How to See Smells: Extracting Olfactory References from Artworks","authors":"Mathias Zinnen","doi":"10.1145/3442442.3453710","DOIUrl":"https://doi.org/10.1145/3442442.3453710","url":null,"abstract":"1 PROBLEM Although being an essential part of how we experience the world, smell is severely undervalued in the context of cultural heritage. The Odeuropa project aims at preserving and recreating the olfactory heritage of Europe. State-of-the-art methods of artificial intelligence are applied to large corpora of visual and textual data ranging from the 16th to 20th century of European history to extract olfactory references. Creating an ontology of smells, this information is stored in the “European Olfactory Knowledge Graph (EOKG)” following standards of the semantic web. My Ph.D. addresses the visual extraction part of the project. We will create a taxonomy of visual smell references and acquire a large corpus of artworks from various early modern European digital collections. Using computer vision techniques, we will implement a pipeline for the combined recognition of olfactory objects, poses, and iconographies and annotate the images from our image corpus accordingly. Following these steps, we will address the following research questions: (i)What visual representations of smell exist in European 16th to 20th century works of art and how can these be represented in the EOKG as an ontology shared with the other work packages of the Odeuropa project? (ii)Whichmachine-learning techniques exist for the automated extraction of olfactory references in the visual arts? Particularly, which techniques are suited to cope with the domain shift problem when applying computer vision techniques to our field of research? (iii) How do the identified techniques perform in terms of established evaluation metrics? Which ones work best for the extraction of olfactory references? Both the preservation of olfactory heritage [3] and the application of machine learning (ML) to cultural heritage [1] have been addressed before. However, in most cases machine learning algorithms are treated as “black boxes” and their application does not contribute back to ML [4]. Computer vision techniques like object detection and pose estimation have successfully been applied to the domain of visual arts ([8], [2]) but have not achieved performance comparable to their application in the photographic domain. One reason for the success of computer vision on photographs is the availability of huge labeled datasets like ImageNet [10]. Datasets containing artworks","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"459 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132941490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
EUDETECTOR: Leveraging Language Model to Identify EU-Related News 欧盟检测器:利用语言模型识别欧盟相关新闻
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452324
Koustav Rudra, Danny Tran, M. Shaltev
News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.
新闻媒体向受众反映一个国家或地区的现状。一个地区的媒体为当地和全球的受众发布不同类型的新闻。在本文中,我们将重点放在欧洲(确切地说是欧盟),并提出一种方法来识别从金融,商业,犯罪,政治等任何方面对欧洲产生影响的新闻。预测新闻的位置本身就是一项具有挑战性的任务。大多数方法都局限于命名实体或手工制作的特性。在本文中,我们试图克服这一限制,即,我们不是只关注命名实体(欧洲位置,政治家等)和一些手工制作的规则,而是在预训练的语言模型BERT的帮助下探索新闻文章的上下文。基于自回归语言模型的欧洲新闻检测器在F-score方面比基线模型提高了9-19%。有趣的是,我们观察到这样的模型自动捕获命名实体,它们的起源等;因此,不需要单独的信息。我们还评估了这些实体在预测中的作用,并探索了BERT在决定新闻类别时真正考虑的令牌。人员、位置、组织等实体被证明是预测的良好理由标记。
{"title":"EUDETECTOR: Leveraging Language Model to Identify EU-Related News","authors":"Koustav Rudra, Danny Tran, M. Shaltev","doi":"10.1145/3442442.3452324","DOIUrl":"https://doi.org/10.1145/3442442.3452324","url":null,"abstract":"News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131859041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PolyU-CBS at the FinSim-2 Task: Combining Distributional, String-Based and Transformers-Based Features for Hypernymy Detection in the Financial Domain PolyU-CBS在FinSim-2任务中的应用:结合分布式、基于字符串和基于变换的特征在金融领域进行超词检测
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451387
Emmanuele Chersoni, Chu-Ren Huang
In this contribution, we describe the systems presented by the PolyU CBS Team at the second Shared Task on Learning Semantic Similarities for the Financial Domain (FinSim-2), where participating teams had to identify the right hypernyms for a list of target terms from the financial domain. For this task, we ran our classification experiments with several distributional, string-based, and Transformer features. Our results show that a simple logistic regression classifier, when trained on a combination of word embeddings, semantic and string similarity metrics and BERT-derived probabilities, achieves a strong performance (above 90%) in financial hypernymy detection.
在这篇文章中,我们描述了理大哥伦比亚广播公司团队在第二次金融领域语义相似度学习共享任务(FinSim-2)上展示的系统,参与的团队必须为金融领域的目标术语列表识别正确的首字母缩略词。对于这个任务,我们用几个分布式的、基于字符串的和Transformer的特征运行了分类实验。我们的研究结果表明,一个简单的逻辑回归分类器,当在词嵌入、语义和字符串相似度量以及bert衍生概率的组合上进行训练时,在金融超长词检测方面取得了很强的性能(超过90%)。
{"title":"PolyU-CBS at the FinSim-2 Task: Combining Distributional, String-Based and Transformers-Based Features for Hypernymy Detection in the Financial Domain","authors":"Emmanuele Chersoni, Chu-Ren Huang","doi":"10.1145/3442442.3451387","DOIUrl":"https://doi.org/10.1145/3442442.3451387","url":null,"abstract":"In this contribution, we describe the systems presented by the PolyU CBS Team at the second Shared Task on Learning Semantic Similarities for the Financial Domain (FinSim-2), where participating teams had to identify the right hypernyms for a list of target terms from the financial domain. For this task, we ran our classification experiments with several distributional, string-based, and Transformer features. Our results show that a simple logistic regression classifier, when trained on a combination of word embeddings, semantic and string similarity metrics and BERT-derived probabilities, achieves a strong performance (above 90%) in financial hypernymy detection.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"97 7-8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133722488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FinSBD-2021: The 3rd Shared Task on Structure Boundary Detection in Unstructured Text in the Financial Domain FinSBD-2021:金融领域非结构化文本结构边界检测的第三个共享任务
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451378
Willy Au, Abderrahim Ait-Azzi, Juyeon Kang
Document processing is a foundational pre-processing task in natural language application applied in the financial domain. In this paper, we present the result of FinSBD-3, the 3rd shared task on Structure Boundary Detection in unstructured text in the financial domain. The shared task is organized as part of the 1st Workshop on Financial Technology on the Web. Participants were asked to create system detecting the boundaries of elements in unstructured text extracted from financial PDF. This edition extends the previous shared tasks by adding boundaries of visual elements such as tables, figures, page headers and page footers; on top of sentences, lists and list items which were already present in previous edition of the shared tasks.
文档处理是自然语言在金融领域应用中的一项基础性预处理任务。本文给出了金融领域非结构化文本结构边界检测的第三个共享任务FinSBD-3的结果。这个共享任务是作为第一届网络金融技术研讨会的一部分组织起来的。参与者被要求创建一个系统来检测从财务PDF中提取的非结构化文本中元素的边界。这个版本扩展了以前的共享任务,增加了视觉元素的边界,如表、图、页眉和页脚;在句子的顶部,列表和列表项目已经出现在以前版本的共享任务中。
{"title":"FinSBD-2021: The 3rd Shared Task on Structure Boundary Detection in Unstructured Text in the Financial Domain","authors":"Willy Au, Abderrahim Ait-Azzi, Juyeon Kang","doi":"10.1145/3442442.3451378","DOIUrl":"https://doi.org/10.1145/3442442.3451378","url":null,"abstract":"Document processing is a foundational pre-processing task in natural language application applied in the financial domain. In this paper, we present the result of FinSBD-3, the 3rd shared task on Structure Boundary Detection in unstructured text in the financial domain. The shared task is organized as part of the 1st Workshop on Financial Technology on the Web. Participants were asked to create system detecting the boundaries of elements in unstructured text extracted from financial PDF. This edition extends the previous shared tasks by adding boundaries of visual elements such as tables, figures, page headers and page footers; on top of sentences, lists and list items which were already present in previous edition of the shared tasks.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modeling Text Data Over Time - Example on Job Postings 随着时间的推移建模文本数据-招聘的例子
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453707
Jakob Jelencic
Modelling multilingual text data over time is a challenging task. This PhD is focused on semantic representation of domain specific short to mid length time stamped textual data. The proposed method is evaluated on the example of job postings, where we are modeling demand on IT jobs. More specifically, we addresses the following three problems: unifying the representation of multilingual text data; clustering similar textual data; using the proposed semantic representation to model and predict future demand of jobs. This works starts with a problem statement, followed by a description of the proposed approach and methodology and is concluded with an overview of the first results and summary of the ongoing research.
随着时间的推移对多语言文本数据进行建模是一项具有挑战性的任务。本博士专注于特定领域的短到中长度时间戳文本数据的语义表示。我们以招聘广告为例对所提出的方法进行了评估,其中我们对IT职位的需求进行建模。更具体地说,我们解决了以下三个问题:统一多语言文本数据的表示;聚类相似文本数据;使用提出的语义表示来建模和预测未来的工作需求。这项工作从一个问题陈述开始,然后是对所建议的方法和方法的描述,最后是对第一个结果的概述和正在进行的研究的总结。
{"title":"Modeling Text Data Over Time - Example on Job Postings","authors":"Jakob Jelencic","doi":"10.1145/3442442.3453707","DOIUrl":"https://doi.org/10.1145/3442442.3453707","url":null,"abstract":"Modelling multilingual text data over time is a challenging task. This PhD is focused on semantic representation of domain specific short to mid length time stamped textual data. The proposed method is evaluated on the example of job postings, where we are modeling demand on IT jobs. More specifically, we addresses the following three problems: unifying the representation of multilingual text data; clustering similar textual data; using the proposed semantic representation to model and predict future demand of jobs. This works starts with a problem statement, followed by a description of the proposed approach and methodology and is concluded with an overview of the first results and summary of the ongoing research.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114672476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HierClasSArt: Knowledge-Aware Hierarchical Classification of Scholarly Articles 学术文章的知识感知层次分类
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451365
Mehwish Alam, Russa Biswas, Yiyi Chen, D. Dessí, Genet Asefa Gesese, Fabian Hoppe, Harald Sack
A huge number of scholarly articles published every day in different domains makes it hard for the experts to organize and stay updated with the new research in a particular domain. This study gives an overview of a new approach, HierClasSArt, for knowledge aware hierarchical classification of the scholarly articles for mathematics into a predefined taxonomy. The method uses combination of neural networks and Knowledge Graphs for better document representation along with the meta-data information. This position paper further discusses the open problems about incorporation of new articles and evolving hierarchies in the pipeline. Mathematics domain has been used as a use-case.
每天在不同领域发表的大量学术文章使得专家很难组织和更新特定领域的新研究。本研究概述了一种新的方法,HierClasSArt,用于将数学学术文章的知识感知分层分类到预定义的分类法中。该方法将神经网络和知识图谱相结合,以更好地表示文档和元数据信息。本立场文件进一步讨论了关于在管道中合并新文章和不断发展的层次结构的开放问题。数学领域被用作一个用例。
{"title":"HierClasSArt: Knowledge-Aware Hierarchical Classification of Scholarly Articles","authors":"Mehwish Alam, Russa Biswas, Yiyi Chen, D. Dessí, Genet Asefa Gesese, Fabian Hoppe, Harald Sack","doi":"10.1145/3442442.3451365","DOIUrl":"https://doi.org/10.1145/3442442.3451365","url":null,"abstract":"A huge number of scholarly articles published every day in different domains makes it hard for the experts to organize and stay updated with the new research in a particular domain. This study gives an overview of a new approach, HierClasSArt, for knowledge aware hierarchical classification of the scholarly articles for mathematics into a predefined taxonomy. The method uses combination of neural networks and Knowledge Graphs for better document representation along with the meta-data information. This position paper further discusses the open problems about incorporation of new articles and evolving hierarchies in the pipeline. Mathematics domain has been used as a use-case.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Inferring Sociodemographic Attributes of Wikipedia Editors: State-of-the-art and Implications for Editor Privacy 推断维基百科编辑的社会人口学属性:最新技术和编辑隐私的含义
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452350
S. Brückner, F. Lemmerich, M. Strohmaier
In this paper, we investigate the state-of-the-art of machine learning models to infer sociodemographic attributes of Wikipedia editors based on their public profile pages and corresponding implications for editor privacy. To build models for inferring sociodemographic attributes, ground truth labels are obtained via different strategies, using publicly disclosed information from editor profile pages. Different embedding techniques are used to derive features from editors’ profile texts. In comparative evaluations of different machine learning models, we show that the highest prediction accuracy can be obtained for the attribute gender, with precision values of 82% to 91% for women and men respectively, as well as an averaged F1-score of 0.78. For other attributes like age group, education, and religion, the utilized classifiers exhibit F1-scores in the range of 0.32 to 0.74, depending on the model class. By merely using publicly disclosed information of Wikipedia editors, we highlight issues surrounding editor privacy on Wikipedia and discuss ways to mitigate this problem. We believe our work can help start a conversation about carefully weighing the potential benefits and harms that come with the existence of information-rich, pre-labeled profile pages of Wikipedia editors.
在本文中,我们研究了最先进的机器学习模型,以根据维基百科编者的公共个人资料页面和相应的编辑隐私影响来推断他们的社会人口统计学属性。为了建立推断社会人口学属性的模型,使用编辑个人资料页面上公开披露的信息,通过不同的策略获得了真实值标签。不同的嵌入技术用于从编辑的概要文本中派生特征。在不同机器学习模型的对比评估中,我们发现属性性别的预测准确率最高,女性和男性的准确率分别为82%到91%,平均f1得分为0.78。对于其他属性,如年龄组、教育程度和宗教,所使用的分类器在0.32到0.74的范围内显示f1分数,这取决于模型类别。通过仅仅使用维基百科编者公开披露的信息,我们强调了围绕维基百科编者隐私的问题,并讨论了缓解这一问题的方法。我们相信我们的工作可以帮助开启一场对话,仔细权衡信息丰富、预先标记的维基百科编辑个人资料页面的存在所带来的潜在利益和危害。
{"title":"Inferring Sociodemographic Attributes of Wikipedia Editors: State-of-the-art and Implications for Editor Privacy","authors":"S. Brückner, F. Lemmerich, M. Strohmaier","doi":"10.1145/3442442.3452350","DOIUrl":"https://doi.org/10.1145/3442442.3452350","url":null,"abstract":"In this paper, we investigate the state-of-the-art of machine learning models to infer sociodemographic attributes of Wikipedia editors based on their public profile pages and corresponding implications for editor privacy. To build models for inferring sociodemographic attributes, ground truth labels are obtained via different strategies, using publicly disclosed information from editor profile pages. Different embedding techniques are used to derive features from editors’ profile texts. In comparative evaluations of different machine learning models, we show that the highest prediction accuracy can be obtained for the attribute gender, with precision values of 82% to 91% for women and men respectively, as well as an averaged F1-score of 0.78. For other attributes like age group, education, and religion, the utilized classifiers exhibit F1-scores in the range of 0.32 to 0.74, depending on the model class. By merely using publicly disclosed information of Wikipedia editors, we highlight issues surrounding editor privacy on Wikipedia and discuss ways to mitigate this problem. We believe our work can help start a conversation about carefully weighing the potential benefits and harms that come with the existence of information-rich, pre-labeled profile pages of Wikipedia editors.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117026841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Explainable Demand Forecasting: A Data Mining Goldmine 可解释需求预测:数据挖掘的金矿
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453708
Jože M. Rožanec
Demand forecasting is a crucial component of demand management. Value is provided to the organization through accurate forecasts and insights into the reasons driving the forecasts to increase confidence and assist decision-making. In this Ph.D., we aim to develop state-of-the-art demand forecasting models for irregular demand, develop explainability mechanisms to avoid exposing models fine-grained information regarding the model features, create a recommender system to assist users on decision-making and develop mechanisms to enrich knowledge graphs with feedback provided by the users through artificial intelligence-powered feedback modules. We have already developed models for accurate forecasts regarding steady and irregular demand and architecture to provide forecast explanations that preserve sensitive information regarding model features. These explanations highlighting real-world events that provide insights on the general context captured through the dataset features while highlighting actionable items and suggesting datasets for future data enrichment.
需求预测是需求管理的重要组成部分。通过准确的预测和洞察驱动预测的原因来增加信心和协助决策,为组织提供价值。在本博士学位中,我们的目标是开发针对不规则需求的最先进的需求预测模型,开发可解释性机制以避免暴露模型中关于模型特征的细粒度信息,创建一个推荐系统以帮助用户决策,并开发通过人工智能反馈模块提供的用户反馈来丰富知识图谱的机制。我们已经开发了关于稳定和不规则需求和架构的准确预测模型,以提供保留关于模型特征的敏感信息的预测解释。这些解释突出了现实世界的事件,提供了通过数据集特征捕获的一般上下文的见解,同时突出了可操作的项目,并为未来的数据丰富提供了数据集建议。
{"title":"Explainable Demand Forecasting: A Data Mining Goldmine","authors":"Jože M. Rožanec","doi":"10.1145/3442442.3453708","DOIUrl":"https://doi.org/10.1145/3442442.3453708","url":null,"abstract":"Demand forecasting is a crucial component of demand management. Value is provided to the organization through accurate forecasts and insights into the reasons driving the forecasts to increase confidence and assist decision-making. In this Ph.D., we aim to develop state-of-the-art demand forecasting models for irregular demand, develop explainability mechanisms to avoid exposing models fine-grained information regarding the model features, create a recommender system to assist users on decision-making and develop mechanisms to enrich knowledge graphs with feedback provided by the users through artificial intelligence-powered feedback modules. We have already developed models for accurate forecasts regarding steady and irregular demand and architecture to provide forecast explanations that preserve sensitive information regarding model features. These explanations highlighting real-world events that provide insights on the general context captured through the dataset features while highlighting actionable items and suggesting datasets for future data enrichment.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"61 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114010392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automating Fairness Configurations for Machine Learning 自动化机器学习公平性配置
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452301
Haipei Sun, Yiding Yang, Yanying Li, Huihui Liu, Xinchao Wang, Wendy Hui Wang
Recent years have witnessed substantial efforts devoted to ensuring algorithmic fairness for machine learning (ML), spanning from formalizing fairness metrics to designing fairness-enhancing methods. These efforts lead to numerous possible choices in terms of fairness definitions and fairness-enhancing algorithms. However, finding the best fairness configuration (including both fairness definition and fairness-enhancing algorithms) for a specific ML task is extremely challenging in practice. The large design space of fairness configurations combined with the tremendous cost required for fairness deployment poses a major obstacle to this endeavor. This raises an important issue: can we enable automated fairness configurations for a new ML task on a potentially unseen dataset? To this point, we design Auto-Fair, a system that provides recommendations of fairness configurations by ranking all fairness configuration candidates based on their evaluations on prior ML tasks. At the core of Auto-Fair lies a meta-learning model that ranks all fairness configuration candidates by utilizing: (1) a set of meta-features that are derived from both datasets and fairness configurations that were used in prior evaluations; and (2) the knowledge accumulated from previous evaluations of fairness configurations on related ML tasks and datasets. The experimental results on 350 different fairness configurations and 1,500 data samples demonstrate the effectiveness of Auto-Fair.
近年来,人们为确保机器学习(ML)的算法公平性做出了大量努力,从形式化公平性指标到设计公平性增强方法。这些努力在公平定义和公平增强算法方面导致了许多可能的选择。然而,在实践中,为特定的ML任务找到最佳的公平性配置(包括公平性定义和公平性增强算法)是极具挑战性的。公平性配置的巨大设计空间以及公平性部署所需的巨大成本对这一努力构成了主要障碍。这就提出了一个重要的问题:我们能否在一个可能看不见的数据集上为一个新的ML任务启用自动公平性配置?为此,我们设计了Auto-Fair,这是一个系统,通过根据所有公平配置候选人对先前ML任务的评估对他们进行排名来提供公平配置的建议。Auto-Fair的核心是一个元学习模型,该模型利用:(1)一组元特征,这些特征来自于先前评估中使用的数据集和公平配置;(2)从之前对相关ML任务和数据集的公平性配置的评估中积累的知识。在350种不同的公平配置和1500个数据样本上的实验结果证明了Auto-Fair的有效性。
{"title":"Automating Fairness Configurations for Machine Learning","authors":"Haipei Sun, Yiding Yang, Yanying Li, Huihui Liu, Xinchao Wang, Wendy Hui Wang","doi":"10.1145/3442442.3452301","DOIUrl":"https://doi.org/10.1145/3442442.3452301","url":null,"abstract":"Recent years have witnessed substantial efforts devoted to ensuring algorithmic fairness for machine learning (ML), spanning from formalizing fairness metrics to designing fairness-enhancing methods. These efforts lead to numerous possible choices in terms of fairness definitions and fairness-enhancing algorithms. However, finding the best fairness configuration (including both fairness definition and fairness-enhancing algorithms) for a specific ML task is extremely challenging in practice. The large design space of fairness configurations combined with the tremendous cost required for fairness deployment poses a major obstacle to this endeavor. This raises an important issue: can we enable automated fairness configurations for a new ML task on a potentially unseen dataset? To this point, we design Auto-Fair, a system that provides recommendations of fairness configurations by ranking all fairness configuration candidates based on their evaluations on prior ML tasks. At the core of Auto-Fair lies a meta-learning model that ranks all fairness configuration candidates by utilizing: (1) a set of meta-features that are derived from both datasets and fairness configurations that were used in prior evaluations; and (2) the knowledge accumulated from previous evaluations of fairness configurations on related ML tasks and datasets. The experimental results on 350 different fairness configurations and 1,500 data samples demonstrate the effectiveness of Auto-Fair.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124708042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Companion Proceedings of the Web Conference 2021
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1