首页 > 最新文献

Companion Proceedings of the Web Conference 2021最新文献

英文 中文
BrFAST: a Tool to Select Browser Fingerprinting Attributes for Web Authentication According to a Usability-Security Trade-off BrFAST:一个根据可用性和安全性权衡选择Web认证的浏览器指纹属性的工具
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3458610
Nampoina Andriamilanto, T. Allard
In this demonstration, we put ourselves in the place of a website manager who seeks to use browser fingerprinting for web authentication. The first step is to choose the attributes to implement among the hundreds that are available. To do so, we developed BrFAST, an attribute selection platform that includes FPSelect, an algorithm that rigorously selects the attributes according to a trade-off between security and usability. BrFAST is configured with a set of parameters for which we provide values for BrFAST to be usable as is. We notably include the resources to use two publicly available browser fingerprint datasets. BrFAST can be extended to use other parameters: other attribute selection methods, other measures of security and usability, or other fingerprint datasets. BrFAST helps visualize the exploration of the possibilities during the search of the best attribute set to use, evaluate the properties of attribute sets, and compare several attribute selection methods. During the demonstration, we compare the attribute sets selected by FPSelect with those selected by the usual methods according to the properties of the resulting browser fingerprints (e.g., their usability, their unicity).
在这个演示中,我们把自己放在一个网站管理员的位置,他试图使用浏览器指纹来进行web身份验证。第一步是从数百个可用属性中选择要实现的属性。为此,我们开发了BrFAST,这是一个包含FPSelect的属性选择平台,FPSelect是一种根据安全性和可用性之间的权衡严格选择属性的算法。BrFAST配置了一组参数,我们为这些参数提供了BrFAST可用的值。值得注意的是,我们包含了使用两个公开可用的浏览器指纹数据集的资源。BrFAST可以扩展到使用其他参数:其他属性选择方法、其他安全性和可用性度量或其他指纹数据集。BrFAST帮助可视化在搜索要使用的最佳属性集期间对各种可能性的探索,评估属性集的属性,并比较几种属性选择方法。在演示过程中,我们根据生成的浏览器指纹的属性(例如,它们的可用性和唯一性),将FPSelect选择的属性集与通常方法选择的属性集进行比较。
{"title":"BrFAST: a Tool to Select Browser Fingerprinting Attributes for Web Authentication According to a Usability-Security Trade-off","authors":"Nampoina Andriamilanto, T. Allard","doi":"10.1145/3442442.3458610","DOIUrl":"https://doi.org/10.1145/3442442.3458610","url":null,"abstract":"In this demonstration, we put ourselves in the place of a website manager who seeks to use browser fingerprinting for web authentication. The first step is to choose the attributes to implement among the hundreds that are available. To do so, we developed BrFAST, an attribute selection platform that includes FPSelect, an algorithm that rigorously selects the attributes according to a trade-off between security and usability. BrFAST is configured with a set of parameters for which we provide values for BrFAST to be usable as is. We notably include the resources to use two publicly available browser fingerprint datasets. BrFAST can be extended to use other parameters: other attribute selection methods, other measures of security and usability, or other fingerprint datasets. BrFAST helps visualize the exploration of the possibilities during the search of the best attribute set to use, evaluate the properties of attribute sets, and compare several attribute selection methods. During the demonstration, we compare the attribute sets selected by FPSelect with those selected by the usual methods according to the properties of the resulting browser fingerprints (e.g., their usability, their unicity).","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131569501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Enhancing Intent Detection in Customer Service with Social Media Data 利用社交媒体数据增强客户服务中的意图检测
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451377
JianTao Huang, Yi-Ru Liou, Hsin-Hsi Chen
Intent detection plays an important role in customer service dialog systems for providing high-quality service in the financial industry. The lack of publicly available datasets and high annotation cost are two challenging issues in this research direction. To overcome these challenges, we propose a social media enhanced self-training approach for intent detection by using label names only. The experimental results show the effectiveness of the proposed method.
意图检测在客户服务对话系统中发挥着重要作用,为金融业提供高质量的服务。缺乏公开可用的数据集和高标注成本是这一研究方向面临的两个挑战。为了克服这些挑战,我们提出了一种仅使用标签名称进行意图检测的社交媒体增强自我训练方法。实验结果表明了该方法的有效性。
{"title":"Enhancing Intent Detection in Customer Service with Social Media Data","authors":"JianTao Huang, Yi-Ru Liou, Hsin-Hsi Chen","doi":"10.1145/3442442.3451377","DOIUrl":"https://doi.org/10.1145/3442442.3451377","url":null,"abstract":"Intent detection plays an important role in customer service dialog systems for providing high-quality service in the financial industry. The lack of publicly available datasets and high annotation cost are two challenging issues in this research direction. To overcome these challenges, we propose a social media enhanced self-training approach for intent detection by using label names only. The experimental results show the effectiveness of the proposed method.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121475772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proposing a Broader Scope of Predictive Features for Modeling Refugee Counts 提出一个更广泛的预测特征来模拟难民数量
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453457
Esther Mead, Maryam Maleki, Recep Erol, Dr Nidhi Agarwal
The world-wide refugee problem has a long history, but continues to this day, and will unfortunately continue into the foreseeable future. Efforts to anticipate, mitigate and prepare for refugee counts, however, are still lacking. There are many potential causes, but the published research has primarily focused on identifying ways to integrate already existing refugees into the various communities wherein they ultimately reside, rather than on preventive measures. The work proposed herein uses a set of features that can be divided into three basic categories: 1) sociocultural, 2) socioeconomic, and 3) economic, which refer to the nature of each proposed predictive feature. For example, corruption perception is a sociocultural feature, access to healthcare is a socioeconomic feature, and inflation is an economic feature. Forty-five predictive features were collected for various years and countries of interest. As may seem intuitive, the features that fell under the category of "economic" produced the highest predictive value from the regression technique employed. However, additional potential predictive features that have not been previously addressed stood out in our experiments. These include: the global peace index (gpi), freedom of expression (fe), internet users (iu), access to healthcare (hc), cost of living index (coli), local purchasing power index (lppi), homicide rate (hr), access to justice (aj), and women's property rights (wpr). Many of these features are nascent in terms of both their development and collection, as well as the fact that some of these features are not yet collected at a universal level, meaning that the data is missing for some countries and years. Ongoing work regarding these datasets for predicting refugee counts is also discussed in this work.
世界范围的难民问题有着悠久的历史,但一直持续到今天,而且不幸地将继续到可预见的未来。然而,预测、减轻和准备难民人数的努力仍然缺乏。有许多潜在的原因,但已发表的研究主要集中在确定如何使已经存在的难民融入他们最终居住的各个社区,而不是采取预防措施。本文提出的工作使用了一组特征,这些特征可以分为三个基本类别:1)社会文化,2)社会经济和3)经济,这些特征指的是每个提出的预测特征的性质。例如,腐败感知是一种社会文化特征,获得医疗保健是一种社会经济特征,通货膨胀是一种经济特征。收集了不同年份和国家的45个预测特征。似乎很直观的是,从所采用的回归技术中,属于“经济”类别的特征产生了最高的预测值。然而,在我们的实验中,以前没有解决的其他潜在预测特征脱颖而出。这些指标包括:全球和平指数(gpi)、言论自由(fe)、互联网用户(iu)、获得医疗保健(hc)、生活成本指数(coli)、当地购买力指数(lppi)、凶杀率(hr)、诉诸司法(aj)和妇女财产权(wpr)。其中许多特征在开发和收集方面都处于初级阶段,而且其中一些特征尚未在普遍水平上收集,这意味着某些国家和年份的数据缺失。正在进行的关于这些数据集预测难民人数的工作也在这项工作中进行了讨论。
{"title":"Proposing a Broader Scope of Predictive Features for Modeling Refugee Counts","authors":"Esther Mead, Maryam Maleki, Recep Erol, Dr Nidhi Agarwal","doi":"10.1145/3442442.3453457","DOIUrl":"https://doi.org/10.1145/3442442.3453457","url":null,"abstract":"The world-wide refugee problem has a long history, but continues to this day, and will unfortunately continue into the foreseeable future. Efforts to anticipate, mitigate and prepare for refugee counts, however, are still lacking. There are many potential causes, but the published research has primarily focused on identifying ways to integrate already existing refugees into the various communities wherein they ultimately reside, rather than on preventive measures. The work proposed herein uses a set of features that can be divided into three basic categories: 1) sociocultural, 2) socioeconomic, and 3) economic, which refer to the nature of each proposed predictive feature. For example, corruption perception is a sociocultural feature, access to healthcare is a socioeconomic feature, and inflation is an economic feature. Forty-five predictive features were collected for various years and countries of interest. As may seem intuitive, the features that fell under the category of \"economic\" produced the highest predictive value from the regression technique employed. However, additional potential predictive features that have not been previously addressed stood out in our experiments. These include: the global peace index (gpi), freedom of expression (fe), internet users (iu), access to healthcare (hc), cost of living index (coli), local purchasing power index (lppi), homicide rate (hr), access to justice (aj), and women's property rights (wpr). Many of these features are nascent in terms of both their development and collection, as well as the fact that some of these features are not yet collected at a universal level, meaning that the data is missing for some countries and years. Ongoing work regarding these datasets for predicting refugee counts is also discussed in this work.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128401825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ShExStatements: Simplifying Shape Expressions for Wikidata ShExStatements:简化维基数据的形状表达式
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452349
J. Samuel
Wikidata recently supported entity schemas based on shape expressions (ShEx). They play an important role in the validation of items belonging to a multitude of domains on Wikidata. However, the number of entity schemas created by the contributors is relatively low compared to the number of WikiProjects. The past couple of years have seen attempts at simplifying the shape expressions and building tools for creating them. In this article, ShExStatements is presented with the goal of simplifying writing the shape expressions for Wikidata.
维基数据最近支持基于形状表达式(ShEx)的实体模式。它们在验证属于维基数据上众多领域的项目方面发挥着重要作用。然而,与维基项目的数量相比,参与者创建的实体模式的数量相对较少。在过去的几年里,人们尝试简化形状表达式并构建用于创建它们的工具。在本文中,介绍ShExStatements的目的是简化为Wikidata编写形状表达式。
{"title":"ShExStatements: Simplifying Shape Expressions for Wikidata","authors":"J. Samuel","doi":"10.1145/3442442.3452349","DOIUrl":"https://doi.org/10.1145/3442442.3452349","url":null,"abstract":"Wikidata recently supported entity schemas based on shape expressions (ShEx). They play an important role in the validation of items belonging to a multitude of domains on Wikidata. However, the number of entity schemas created by the contributors is relatively low compared to the number of WikiProjects. The past couple of years have seen attempts at simplifying the shape expressions and building tools for creating them. In this article, ShExStatements is presented with the goal of simplifying writing the shape expressions for Wikidata.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Brief Analysis of Bengali Wikipedia’s Journey to 100,000 Articles 简析孟加拉文维基百科达到10万篇文章的历程
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452340
Ankan Ghosh Dastider
The Bengali Wikipedia has recently crossed the milestone of 100,000 articles after a journey of almost 17 years in December 2020. In this journey, the Bengali language edition of the world’s largest encyclopedia has experienced multiple changes with a promising increase in the overall performance considering the growth of community members and content. This paper analyzes the various associating factors throughout this journey including the number of active editors, number of content pages, pageview, etc., along with the connection to outreach activities with these parameters. The gender gap has been a worldwide problem and is quite prevalent in Bengali Wikipedia as well, which seems to be unchanged over the years and consequentially, leaving a conspicuous disparity in the movement. The paper inspects the present scenario of Bengali Wikipedia through quantitative factors with a relative comparison with other regional languages.
2020年12月,经过近17年的发展,孟加拉语维基百科最近突破了10万篇文章的里程碑。在这段旅程中,这个世界上最大的百科全书的孟加拉语版本经历了多次变化,考虑到社区成员和内容的增长,整体表现有了有希望的增长。本文分析了整个过程中的各种相关因素,包括活跃编辑的数量,内容页面的数量,页面浏览量等,以及与这些参数的外展活动的联系。性别差距一直是一个世界性的问题,在孟加拉语维基百科中也相当普遍,多年来似乎没有改变,因此,在运动中留下了明显的差距。本文通过定量因素考察了孟加拉语维基百科的现状,并与其他地区语言进行了相对比较。
{"title":"A Brief Analysis of Bengali Wikipedia’s Journey to 100,000 Articles","authors":"Ankan Ghosh Dastider","doi":"10.1145/3442442.3452340","DOIUrl":"https://doi.org/10.1145/3442442.3452340","url":null,"abstract":"The Bengali Wikipedia has recently crossed the milestone of 100,000 articles after a journey of almost 17 years in December 2020. In this journey, the Bengali language edition of the world’s largest encyclopedia has experienced multiple changes with a promising increase in the overall performance considering the growth of community members and content. This paper analyzes the various associating factors throughout this journey including the number of active editors, number of content pages, pageview, etc., along with the connection to outreach activities with these parameters. The gender gap has been a worldwide problem and is quite prevalent in Bengali Wikipedia as well, which seems to be unchanged over the years and consequentially, leaving a conspicuous disparity in the movement. The paper inspects the present scenario of Bengali Wikipedia through quantitative factors with a relative comparison with other regional languages.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130915250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Analysis of Wikigraph to Investigate Quality Grades of Wikipedia Articles 对维基百科文章质量等级的结构分析
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452345
Anamika Chhabra, S. Srivastava, S. Iyengar, P. Saini
The quality of Wikipedia articles is manually evaluated which is time inefficient as well as susceptible to human bias. An automated assessment of these articles may help in minimizing the overall time and manual errors. In this paper, we present a novel approach based on the structural analysis of Wikigraph to automate the estimation of the quality of Wikipedia articles. We examine the network built using the complete set of English Wikipedia articles and identify the variation of network signatures of the articles with respect to their quality. Our study shows that these signatures are useful for estimating the quality grades of un-assessed articles with an accuracy surpassing the existing approaches in this direction. The results of the study may help in reducing the need for human involvement for quality assessment tasks.
维基百科文章的质量是人工评估的,这既费时又容易受到人为偏见的影响。对这些文章的自动评估可能有助于减少总体时间和手动错误。在本文中,我们提出了一种基于维基百科结构分析的方法来自动估计维基百科文章的质量。我们检查了使用完整的英文维基百科文章集构建的网络,并确定了文章的网络签名在质量方面的变化。我们的研究表明,这些签名对于估计未评估文章的质量等级是有用的,其准确性超过了这个方向上现有的方法。这项研究的结果可能有助于减少人类参与质量评估任务的需要。
{"title":"Structural Analysis of Wikigraph to Investigate Quality Grades of Wikipedia Articles","authors":"Anamika Chhabra, S. Srivastava, S. Iyengar, P. Saini","doi":"10.1145/3442442.3452345","DOIUrl":"https://doi.org/10.1145/3442442.3452345","url":null,"abstract":"The quality of Wikipedia articles is manually evaluated which is time inefficient as well as susceptible to human bias. An automated assessment of these articles may help in minimizing the overall time and manual errors. In this paper, we present a novel approach based on the structural analysis of Wikigraph to automate the estimation of the quality of Wikipedia articles. We examine the network built using the complete set of English Wikipedia articles and identify the variation of network signatures of the articles with respect to their quality. Our study shows that these signatures are useful for estimating the quality grades of un-assessed articles with an accuracy surpassing the existing approaches in this direction. The results of the study may help in reducing the need for human involvement for quality assessment tasks.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131215565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles 性别在新闻中重要吗?新闻文章中性别偏见的发现与检验
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452325
Jamell Dacon, Haochen Liu
To attract unsuspecting readers, news article headlines and abstracts are often written with speculative sentences or clauses. Male dominance in the news is very evident, whereas females are seen as “eye candy” or “inferior”, and are underrepresented and under-examined within the same news categories as their male counterparts. In this paper, we present an initial study on gender bias in news abstracts in two large English news datasets used for news recommendation and news classification. We perform three large-scale, yet effective text-analysis fairness measurements on 296,965 news abstracts. In particular, to our knowledge we construct two of the largest benchmark datasets of possessive (gender-specific and gender-neutral) nouns and attribute (career-related and family-related) words datasets1 which we will release to foster both bias and fairness research aid in developing fair NLP models to eliminate the paradox of gender bias. Our studies demonstrate that females are immensely marginalized and suffer from socially-constructed biases in the news. This paper individually devises a methodology whereby news content can be analyzed on a large scale utilizing natural language processing (NLP) techniques from machine learning (ML) to discover both implicit and explicit gender biases.
为了吸引毫无戒心的读者,新闻文章的标题和摘要经常用推测性的句子或从句来写。男性在新闻中的主导地位是非常明显的,而女性则被视为“花瓶”或“劣等”,在与男性同行相同的新闻类别中,她们的代表性和审查力度不足。在本文中,我们对两个用于新闻推荐和新闻分类的大型英语新闻数据集中的新闻摘要中的性别偏见进行了初步研究。我们对296,965篇新闻摘要进行了三次大规模但有效的文本分析公平性测量。特别是,据我们所知,我们构建了两个最大的所有格(性别特定和性别中性)名词和属性(职业相关和家庭相关)词数据集的基准数据集1,我们将发布这些数据集,以促进偏见和公平研究,帮助开发公平的NLP模型,以消除性别偏见的悖论。我们的研究表明,女性在新闻中被极大地边缘化,并遭受社会建构的偏见。本文单独设计了一种方法,可以利用机器学习(ML)中的自然语言处理(NLP)技术大规模分析新闻内容,以发现内隐和外显的性别偏见。
{"title":"Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles","authors":"Jamell Dacon, Haochen Liu","doi":"10.1145/3442442.3452325","DOIUrl":"https://doi.org/10.1145/3442442.3452325","url":null,"abstract":"To attract unsuspecting readers, news article headlines and abstracts are often written with speculative sentences or clauses. Male dominance in the news is very evident, whereas females are seen as “eye candy” or “inferior”, and are underrepresented and under-examined within the same news categories as their male counterparts. In this paper, we present an initial study on gender bias in news abstracts in two large English news datasets used for news recommendation and news classification. We perform three large-scale, yet effective text-analysis fairness measurements on 296,965 news abstracts. In particular, to our knowledge we construct two of the largest benchmark datasets of possessive (gender-specific and gender-neutral) nouns and attribute (career-related and family-related) words datasets1 which we will release to foster both bias and fairness research aid in developing fair NLP models to eliminate the paradox of gender bias. Our studies demonstrate that females are immensely marginalized and suffer from socially-constructed biases in the news. This paper individually devises a methodology whereby news content can be analyzed on a large scale utilizing natural language processing (NLP) techniques from machine learning (ML) to discover both implicit and explicit gender biases.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133875496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Towards Ongoing Detection of Linguistic Bias on Wikipedia 对维基百科语言偏见的持续检测
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452353
K. Madanagopal, James Caverlee
Wikipedia is a critical platform for organizing and disseminating knowledge. One of the key principles of Wikipedia is neutral point of view (NPOV), so that bias is not injected into objective treatment of subject matter. As part of our research vision to develop resilient bias detection models that can self-adapt over time, we present in this paper our initial investigation of the potential of a cross-domain transfer learning approach to improve Wikipedia bias detection. The ultimate goal is to future-proof Wikipedia in the face of dynamic, evolving kinds of linguistic bias and adversarial manipulations intended to evade NPOV issues. We highlight the impact of incorporating evidence of bias from other subjectivity rich domains into further pre-training a BERT-based model, resulting in strong performance in comparison with traditional methods.
维基百科是组织和传播知识的重要平台。维基百科的一个关键原则是中立的观点(NPOV),这样偏见就不会被注入到对主题的客观处理中。作为我们研究愿景的一部分,我们开发了可以随时间自适应的弹性偏差检测模型,我们在本文中介绍了我们对跨域迁移学习方法改进维基百科偏差检测的潜力的初步研究。最终目标是让维基百科在面对动态的、不断演变的语言偏见和旨在逃避NPOV问题的对抗性操纵时,能够经得起未来的挑战。我们强调了将来自其他主观性丰富领域的偏见证据纳入进一步预训练基于bert的模型的影响,与传统方法相比,它的性能更强。
{"title":"Towards Ongoing Detection of Linguistic Bias on Wikipedia","authors":"K. Madanagopal, James Caverlee","doi":"10.1145/3442442.3452353","DOIUrl":"https://doi.org/10.1145/3442442.3452353","url":null,"abstract":"Wikipedia is a critical platform for organizing and disseminating knowledge. One of the key principles of Wikipedia is neutral point of view (NPOV), so that bias is not injected into objective treatment of subject matter. As part of our research vision to develop resilient bias detection models that can self-adapt over time, we present in this paper our initial investigation of the potential of a cross-domain transfer learning approach to improve Wikipedia bias detection. The ultimate goal is to future-proof Wikipedia in the face of dynamic, evolving kinds of linguistic bias and adversarial manipulations intended to evade NPOV issues. We highlight the impact of incorporating evidence of bias from other subjectivity rich domains into further pre-training a BERT-based model, resulting in strong performance in comparison with traditional methods.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133184993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Keystone Citations for Constructing Validity Chains among Research Papers 寻找构建论文效度链的关键引文
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451368
Yuanxi Fu, Jodi Schneider, Catherine Blake
New discoveries in science are often built upon previous knowledge. Ideally, such dependency information should be made explicit in a scientific knowledge graph. The Keystone Framework was proposed for tracking the validity dependency among papers. A keystone citation indicates that the validity of a given paper depends on a previously published paper it cites. In this paper, we propose and evaluate a strategy that repurposes rhetorical category classifiers for the novel application of extracting keystone citations that relate to research methods. Five binary rhetorical category classifiers were constructed to identify Background, Objective, Methods, Results, and Conclusions sentences in biomedical papers. The resulting classifiers were used to test the strategy against two datasets. The initial strategy assumed that only citations contained in Methods sentences were methods keystone citations, but our analysis revealed that citations contained in sentences classified as either Methods or Results had a high likelihood to be methods keystone citations. Future work will focus on fine tuning the rhetorical category classifiers, experimenting with multiclass classifiers, evaluating the revised strategy with more data, and constructing a larger gold standard citation context sentence dataset for model training.
科学上的新发现往往是建立在已有知识的基础上的。理想情况下,这种依赖信息应该在科学知识图中明确表示。提出了用于跟踪论文之间有效性依赖关系的Keystone框架。关键引文表明,给定论文的有效性取决于它引用的先前发表的论文。在本文中,我们提出并评估了一种策略,该策略将修辞范畴分类器重新用于提取与研究方法相关的关键引文的新应用。构建了生物医学论文中背景句、目的句、方法句、结果句和结论句的二元修辞范畴分类器。所得到的分类器用于针对两个数据集测试该策略。最初的策略假设只有方法句子中包含的引文是方法关键引文,但我们的分析显示,方法或结果句子中包含的引文极有可能是方法关键引文。未来的工作将集中在微调修辞类别分类器,实验多类分类器,用更多的数据评估修订后的策略,并构建一个更大的金标准引用上下文句子数据集用于模型训练。
{"title":"Finding Keystone Citations for Constructing Validity Chains among Research Papers","authors":"Yuanxi Fu, Jodi Schneider, Catherine Blake","doi":"10.1145/3442442.3451368","DOIUrl":"https://doi.org/10.1145/3442442.3451368","url":null,"abstract":"New discoveries in science are often built upon previous knowledge. Ideally, such dependency information should be made explicit in a scientific knowledge graph. The Keystone Framework was proposed for tracking the validity dependency among papers. A keystone citation indicates that the validity of a given paper depends on a previously published paper it cites. In this paper, we propose and evaluate a strategy that repurposes rhetorical category classifiers for the novel application of extracting keystone citations that relate to research methods. Five binary rhetorical category classifiers were constructed to identify Background, Objective, Methods, Results, and Conclusions sentences in biomedical papers. The resulting classifiers were used to test the strategy against two datasets. The initial strategy assumed that only citations contained in Methods sentences were methods keystone citations, but our analysis revealed that citations contained in sentences classified as either Methods or Results had a high likelihood to be methods keystone citations. Future work will focus on fine tuning the rhetorical category classifiers, experimenting with multiclass classifiers, evaluating the revised strategy with more data, and constructing a larger gold standard citation context sentence dataset for model training.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Bounce Rate Prediction for Rare Queries by Leveraging Landing Page Signals 利用登陆页信号改进罕见查询的跳出率预测
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453540
Yeshi Dolma, Raunak Kalani, Astha Agrawal, Saurav Basu
Bounce rate prediction for clicked ads in sponsored search advertising is crucial for improving the quality of ads shown to the user. Bounce rate represents the proportion of landing pages for clicked ads on which users spend less than a specified time signifying that the user did not find a possible match of their query intent with the landing page content. In the pay-per-click revenue model for search engines, higher bounce rates mean advertisers get charged without meaningful user engagement, which impacts user and advertiser retention in long term. In real-time search engine settings complex ML models are prohibitive due to stringent latency requirements. Also historical logs are ineffective for rare queries (tail) where the data is sparse, as well as for matching user intent to adcopy when the query and bidded keywords don’t exactly overlap (smart match). In this paper, we propose a real-time bounce rate prediction system that leverages lightweight features like modified tf, positional and proximity features computed from ad landing pages and improves prediction for rare queries. The model preserves privacy and uses no user based feature. The entire ensemble is trained on millions of examples from the offline user log of the Bing commercial search engine and improves the ranking metrics for tail queries and smart match by more than 2x compared to a model that only uses ad-copy-advertiser features.
赞助搜索广告中点击广告的跳出率预测对于提高向用户展示的广告质量至关重要。跳出率表示用户在点击广告的登陆页面上花费的时间少于指定时间的比例,这表明用户没有发现他们的查询意图与登陆页面内容可能相匹配。在搜索引擎的按点击付费收入模式中,较高的跳出率意味着广告商在没有用户粘性的情况下收取费用,这将影响用户和广告商的长期留存率。在实时搜索引擎设置中,由于严格的延迟要求,复杂的ML模型是令人望而却步的。此外,对于数据稀疏的罕见查询(tail),以及当查询和出价关键字不完全重叠时(智能匹配)匹配用户意图时,历史日志是无效的。在本文中,我们提出了一个实时跳出率预测系统,该系统利用了从广告登陆页面计算的修改tf、位置和邻近特征等轻量级特征,并改进了对罕见查询的预测。该模型保护隐私,不使用基于用户的特性。整个集合是在必应商业搜索引擎的数百万个离线用户日志样本上进行训练的,与只使用广告复制广告主特征的模型相比,它将尾部查询和智能匹配的排名指标提高了2倍以上。
{"title":"Improving Bounce Rate Prediction for Rare Queries by Leveraging Landing Page Signals","authors":"Yeshi Dolma, Raunak Kalani, Astha Agrawal, Saurav Basu","doi":"10.1145/3442442.3453540","DOIUrl":"https://doi.org/10.1145/3442442.3453540","url":null,"abstract":"Bounce rate prediction for clicked ads in sponsored search advertising is crucial for improving the quality of ads shown to the user. Bounce rate represents the proportion of landing pages for clicked ads on which users spend less than a specified time signifying that the user did not find a possible match of their query intent with the landing page content. In the pay-per-click revenue model for search engines, higher bounce rates mean advertisers get charged without meaningful user engagement, which impacts user and advertiser retention in long term. In real-time search engine settings complex ML models are prohibitive due to stringent latency requirements. Also historical logs are ineffective for rare queries (tail) where the data is sparse, as well as for matching user intent to adcopy when the query and bidded keywords don’t exactly overlap (smart match). In this paper, we propose a real-time bounce rate prediction system that leverages lightweight features like modified tf, positional and proximity features computed from ad landing pages and improves prediction for rare queries. The model preserves privacy and uses no user based feature. The entire ensemble is trained on millions of examples from the offline user log of the Bing commercial search engine and improves the ranking metrics for tail queries and smart match by more than 2x compared to a model that only uses ad-copy-advertiser features.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114896809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Companion Proceedings of the Web Conference 2021
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1