Proceedings of the Web Conference 2021最新文献_第5页

A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning 一种用于低秩矩阵学习的可伸缩、自适应、健全的非凸正则化器

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450142

Yaqing Wang, Quanming Yao, J. Kwok

Matrix learning is at the core of many machine learning problems. A number of real-world applications such as collaborative filtering and text mining can be formulated as a low-rank matrix completion problems, which recovers incomplete matrix using low-rank assumptions. To ensure that the matrix solution has a low rank, a recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer good recovery performance and have nice theoretical properties, but are computationally expensive due to repeated access to individual singular values. In this paper, based on the key insight that adaptive shrinkage on singular values improve empirical performance, we propose a new nonconvex low-rank regularizer called ”nuclear norm minus Frobenius norm” regularizer, which is scalable, adaptive and sound. We first show it provably holds the adaptive shrinkage property. Further, we discover its factored form which bypasses the computation of singular values and allows fast optimization by general optimization algorithms. Stable recovery and convergence are guaranteed. Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank matrix learning methods. 1

矩阵学习是许多机器学习问题的核心。许多现实世界的应用，如协同过滤和文本挖掘，可以被表述为一个低秩矩阵补全问题，它使用低秩假设来恢复不完整矩阵。为了确保矩阵解具有低秩，最近的趋势是使用自适应惩罚奇异值的非凸正则化器。它们提供了良好的恢复性能和良好的理论性质，但由于重复访问单个奇异值，计算成本很高。本文基于对奇异值的自适应收缩提高经验性能的关键见解，提出了一种新的非凸低秩正则化器，称为“核范数减去Frobenius范数”正则化器，该正则化器具有可扩展性、自适应性和可靠性。我们首先证明它具有可证明的自适应收缩特性。进一步，我们发现了它的分解形式，它绕过了奇异值的计算，并允许通过一般优化算法进行快速优化。保证稳定的恢复和收敛。在大量合成和真实数据集上进行的大量低秩矩阵补全实验表明，与现有的低秩矩阵学习方法相比，该方法获得了最先进的恢复性能，同时速度最快。1

{"title":"A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning","authors":"Yaqing Wang, Quanming Yao, J. Kwok","doi":"10.1145/3442381.3450142","DOIUrl":"https://doi.org/10.1145/3442381.3450142","url":null,"abstract":"Matrix learning is at the core of many machine learning problems. A number of real-world applications such as collaborative filtering and text mining can be formulated as a low-rank matrix completion problems, which recovers incomplete matrix using low-rank assumptions. To ensure that the matrix solution has a low rank, a recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer good recovery performance and have nice theoretical properties, but are computationally expensive due to repeated access to individual singular values. In this paper, based on the key insight that adaptive shrinkage on singular values improve empirical performance, we propose a new nonconvex low-rank regularizer called ”nuclear norm minus Frobenius norm” regularizer, which is scalable, adaptive and sound. We first show it provably holds the adaptive shrinkage property. Further, we discover its factored form which bypasses the computation of singular values and allows fast optimization by general optimization algorithms. Stable recovery and convergence are guaranteed. Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank matrix learning methods. 1","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130382470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service WiseTrans:移动Web服务自适应传输协议选择

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449958

Jia Zhang, Enhuan Dong, Zili Meng, Yuan Yang, Mingwei Xu, Sijie Yang, Miao Zhang, Yang Yue

To improve the performance of mobile web service, a new transport protocol, QUIC, has been recently proposed. However, for large-scale real-world deployments, deciding whether and when to use QUIC in mobile web service is challenging. Complex temporal correlation of network conditions, high spatial heterogeneity of users in a nationwide deployment, and limited resources on mobile devices all affect the selection of transport protocols. In this paper, we present WiseTrans to adaptively switch transport protocols for mobile web service online and improve the completion time of web requests. WiseTrans introduces machine learning techniques to deal with temporal heterogeneity, makes decisions with historical information to handle spatial heterogeneity, and switches transport protocols at the request level to reach both high performance and acceptable overhead. We implement WiseTrans on two platforms (Android and iOS) in a popular mobile web service application of Baidu. Comprehensive experiments demonstrate that WiseTrans can reduce request completion time by up to 26.5% on average compared to the usage of a single protocol.

为了提高移动web服务的性能，最近提出了一种新的传输协议QUIC。然而，对于大规模的实际部署，决定是否以及何时在移动web服务中使用QUIC是具有挑战性的。网络条件的复杂时间相关性、全国部署中用户的高空间异质性以及移动设备资源的有限性都会影响传输协议的选择。在本文中，我们提出了WiseTrans在线自适应切换移动web服务的传输协议，并提高了web请求的完成时间。WiseTrans引入了机器学习技术来处理时间异质性，根据历史信息做出决策来处理空间异质性，并在请求级别切换传输协议以达到高性能和可接受的开销。我们在百度的一个流行的移动web服务应用中实现了两个平台(Android和iOS)的WiseTrans。综合实验表明，与使用单一协议相比，WiseTrans可以将请求完成时间平均减少26.5%。

{"title":"WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service","authors":"Jia Zhang, Enhuan Dong, Zili Meng, Yuan Yang, Mingwei Xu, Sijie Yang, Miao Zhang, Yang Yue","doi":"10.1145/3442381.3449958","DOIUrl":"https://doi.org/10.1145/3442381.3449958","url":null,"abstract":"To improve the performance of mobile web service, a new transport protocol, QUIC, has been recently proposed. However, for large-scale real-world deployments, deciding whether and when to use QUIC in mobile web service is challenging. Complex temporal correlation of network conditions, high spatial heterogeneity of users in a nationwide deployment, and limited resources on mobile devices all affect the selection of transport protocols. In this paper, we present WiseTrans to adaptively switch transport protocols for mobile web service online and improve the completion time of web requests. WiseTrans introduces machine learning techniques to deal with temporal heterogeneity, makes decisions with historical information to handle spatial heterogeneity, and switches transport protocols at the request level to reach both high performance and acceptable overhead. We implement WiseTrans on two platforms (Android and iOS) in a popular mobile web service application of Baidu. Comprehensive experiments demonstrate that WiseTrans can reduce request completion time by up to 26.5% on average compared to the usage of a single protocol.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Dr.Emotion: Disentangled Representation Learning for Emotion Analysis on Social Media to Improve Community Resilience in the COVID-19 Era and Beyond Dr.Emotion:社交媒体情感分析的解纠缠表征学习，以提高COVID-19时代及以后的社区复原力

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449961

Mingxuan Ju, Wei Song, Shiyu Sun, Yanfang Ye, Yujie Fan, Shifu Hou, K. Loparo, Liang Zhao

During the pandemic caused by coronavirus disease (COVID-19), social media has played an important role by enabling people to discuss their experiences and feelings of this global crisis. To help combat the prolonged pandemic that has exposed vulnerabilities impacting community resilience, in this paper, based on our established large-scale COVID-19 related social media data, we propose and develop an integrated framework (named Dr.Emotion) to learn disentangled representations of social media posts (i.e., tweets) for emotion analysis and thus to gain deep insights into public perceptions towards COVID-19. In Dr.Emotion, for given social media posts, we first post-train a transformer-based model to obtain the initial post embeddings. Since users may implicitly express their emotions in social media posts which could be highly entangled with other descriptive information in the post content, to address this challenge for emotion analysis, we propose an adversarial disentangler by integrating emotion-independent (i.e., sentiment-neutral) priors of the posts generated by another post-trained transformer-based model to separate and disentangle the implicitly encoded emotions from the content in latent space for emotion classification at the first attempt. Extensive experimental studies are conducted to fully evaluate Dr.Emotion and promising results demonstrate its performance in emotion analysis by comparison with the state-of-the-art baseline methods. By exploiting our developed Dr.Emotion, we further perform emotion analysis over a large number of social media posts and provide in-depth investigation from both temporal and geographical perspectives, based on which additional work can be conducted to extract and transform the constructive ideas, experiences and support into actionable information to improve community resilience in responses to a variety of crises created by COVID-19 and well beyond.

在由冠状病毒病(COVID-19)引起的大流行期间，社交媒体发挥了重要作用，使人们能够讨论他们对这场全球危机的经历和感受。为了帮助应对暴露出影响社区复原力的脆弱性的长期大流行，本文基于我们已建立的与COVID-19相关的大规模社交媒体数据，我们提出并开发了一个集成框架(名为Dr.Emotion)，以学习社交媒体帖子(即推文)的解耦表示，用于情绪分析，从而深入了解公众对COVID-19的看法。在Dr.Emotion中，对于给定的社交媒体帖子，我们首先对基于transformer的模型进行后训练，以获得初始帖子嵌入。由于用户可能会在社交媒体帖子中含蓄地表达他们的情绪，这些情绪可能与帖子内容中的其他描述性信息高度纠缠，为了解决情绪分析的这一挑战，我们提出了一种对抗性解纠缠器，通过整合情绪独立(即，另一个基于后训练的基于变换的模型生成的帖子的情感中性先验，在潜在空间中将隐含编码的情感从内容中分离出来，进行情感分类。我们进行了大量的实验研究，以充分评估Dr.Emotion，并通过与最先进的基线方法进行比较，证明了其在情绪分析中的表现。通过利用我们开发的情感博士，我们进一步对大量社交媒体帖子进行情感分析，并从时间和地理角度进行深入调查，在此基础上，我们可以开展额外的工作，提取建设性的想法、经验和支持，并将其转化为可操作的信息，以提高社区应对COVID-19引发的各种危机的复原力。

{"title":"Dr.Emotion: Disentangled Representation Learning for Emotion Analysis on Social Media to Improve Community Resilience in the COVID-19 Era and Beyond","authors":"Mingxuan Ju, Wei Song, Shiyu Sun, Yanfang Ye, Yujie Fan, Shifu Hou, K. Loparo, Liang Zhao","doi":"10.1145/3442381.3449961","DOIUrl":"https://doi.org/10.1145/3442381.3449961","url":null,"abstract":"During the pandemic caused by coronavirus disease (COVID-19), social media has played an important role by enabling people to discuss their experiences and feelings of this global crisis. To help combat the prolonged pandemic that has exposed vulnerabilities impacting community resilience, in this paper, based on our established large-scale COVID-19 related social media data, we propose and develop an integrated framework (named Dr.Emotion) to learn disentangled representations of social media posts (i.e., tweets) for emotion analysis and thus to gain deep insights into public perceptions towards COVID-19. In Dr.Emotion, for given social media posts, we first post-train a transformer-based model to obtain the initial post embeddings. Since users may implicitly express their emotions in social media posts which could be highly entangled with other descriptive information in the post content, to address this challenge for emotion analysis, we propose an adversarial disentangler by integrating emotion-independent (i.e., sentiment-neutral) priors of the posts generated by another post-trained transformer-based model to separate and disentangle the implicitly encoded emotions from the content in latent space for emotion classification at the first attempt. Extensive experimental studies are conducted to fully evaluate Dr.Emotion and promising results demonstrate its performance in emotion analysis by comparison with the state-of-the-art baseline methods. By exploiting our developed Dr.Emotion, we further perform emotion analysis over a large number of social media posts and provide in-depth investigation from both temporal and geographical perspectives, based on which additional work can be conducted to extract and transform the constructive ideas, experiences and support into actionable information to improve community resilience in responses to a variety of crises created by COVID-19 and well beyond.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123676115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework 一个检测器统治所有:走向一个通用的深度伪造攻击检测框架

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449809

Shahroz Tariq, Sangyup Lee, Simon S. Woo

Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.

基于深度学习的视频处理方法已经被大众广泛使用。人们可以毫不费力地快速学习如何生成深度造假(DF)视频。虽然已经提出了基于深度学习的检测方法来识别特定类型的df，但它们的性能在其他类型的深度伪造方法中受到影响，包括现实世界的深度伪造，因为它们没有得到充分的训练。换句话说，大多数提出的基于深度学习的检测方法缺乏可转移性和泛化性。除了从基准deepfake数据集检测单一类型的DF之外，我们还专注于开发一种通用的方法来检测多种类型的DF，包括来自未知生成方法的深度伪造，例如deepfake -in- wild (DFW)视频。为了更好地处理未知和不可见的深度伪造，我们引入了一种基于卷积lstm的残差网络(CLRNet)，该网络采用独特的模型训练策略，探索深度伪造中的空间和时间信息。通过大量的实验，我们表明现有的防御方法还没有为现实世界的部署做好准备。而我们的防御方法(CLRNet)在检测各种基准深度伪造方法时实现了更好的泛化(平均为97.57%)。此外，我们用一个高质量的DeepFake-in-the-Wild数据集来评估我们的方法，该数据集收集自互联网，包含大量视频，超过15万帧。我们的CLRNet模型表明，通过达到93.86%的检测准确率，它可以很好地泛化高质量的DFW视频，大大优于现有的最先进的防御方法。

{"title":"One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework","authors":"Shahroz Tariq, Sangyup Lee, Simon S. Woo","doi":"10.1145/3442381.3449809","DOIUrl":"https://doi.org/10.1145/3442381.3449809","url":null,"abstract":"Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124305182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

“Is it a Qoincidence?”: An Exploratory Study of QAnon on Voat “这是巧合吗?”: Voat上QAnon的探索性研究

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450036

Antonis Papasavva, Jeremy Blackburn, G. Stringhini, Savvas Zannettou, Emiliano De Cristofaro

Online fringe communities offer fertile grounds to users seeking and sharing ideas fueling suspicion of mainstream news and conspiracy theories. Among these, the QAnon conspiracy theory emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. Simultaneously, governments are thought to be controlled by “puppet masters,” as democratically elected officials serve as a fake showroom of democracy. This paper provides an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit), to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word embeddings to identify narratives around QAnon-specific keywords. Our graph visualization shows that some of the QAnon-related ones are closely related to those from the Pizzagate conspiracy theory and so-called drops by “Q.” Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community.

在线边缘社区为寻求和分享想法的用户提供了肥沃的土壤，这助长了对主流新闻和阴谋论的怀疑。其中，2017年在4chan上出现了QAnon阴谋论，广泛支持有权势的政治家、贵族和名人密切参与全球恋童癖团伙的观点。与此同时，政府被认为是由“傀儡主人”控制的，因为民主选举的官员充当了虚假的民主陈列室。本文对Voat上的QAnon社区进行了实证探索性分析。QAnon是一个类似于reddit的新闻聚合网站，它因其毒性和为QAnon的追随者提供平台而引起了媒体的兴趣。更准确地说，我们分析了来自/v/GreatAwakening的大型数据集，这是最受欢迎的qanon相关分支(相当于Voat的subreddit)，以表征活动和用户参与度。为了进一步理解围绕QAnon的讨论，我们研究了帖子中提到的最受欢迎的命名实体，以及最突出的讨论话题，这些话题集中在美国政治、唐纳德·特朗普和世界事件上。我们还使用词嵌入来识别围绕qanon特定关键词的叙述。我们的可视化图表显示，一些与qannon相关的问题与披萨门阴谋论和所谓的q下降密切相关最后，我们分析了内容毒性，发现在/v/GreatAwakening上的讨论毒性比在广泛的Voat社区中要小。

{"title":"“Is it a Qoincidence?”: An Exploratory Study of QAnon on Voat","authors":"Antonis Papasavva, Jeremy Blackburn, G. Stringhini, Savvas Zannettou, Emiliano De Cristofaro","doi":"10.1145/3442381.3450036","DOIUrl":"https://doi.org/10.1145/3442381.3450036","url":null,"abstract":"Online fringe communities offer fertile grounds to users seeking and sharing ideas fueling suspicion of mainstream news and conspiracy theories. Among these, the QAnon conspiracy theory emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. Simultaneously, governments are thought to be controlled by “puppet masters,” as democratically elected officials serve as a fake showroom of democracy. This paper provides an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit), to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word embeddings to identify narratives around QAnon-specific keywords. Our graph visualization shows that some of the QAnon-related ones are closely related to those from the Pizzagate conspiracy theory and so-called drops by “Q.” Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125653805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Peer Grading the Peer Reviews: A Dual-Role Approach for Lightening the Scholarly Paper Review Process 同行评议分级:减轻学术论文评议过程的双重角色方法

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450088

Ines Arous, Jie Yang, Mourad Khayati, P. Cudré-Mauroux

Scientific peer review is pivotal to maintain quality standards for academic publication. The effectiveness of the reviewing process is currently being challenged by the rapid increase of paper submissions in various conferences. Those venues need to recruit a large number of reviewers of different levels of expertise and background. The submitted reviews often do not meet the conformity standards of the conferences. Such a situation poses an ever-bigger burden on the meta-reviewers when trying to reach a final decision. In this work, we propose a human-AI approach that estimates the conformity of reviews to the conference standards. Specifically, we ask peers to grade each other’s reviews anonymously with respect to important criteria of review conformity such as sufficient justification and objectivity. We introduce a Bayesian framework that learns the conformity of reviews from both the peer grading process, historical reviews and decisions of a conference, while taking into account grading reliability. Our approach helps meta-reviewers easily identify reviews that require clarification and detect submissions requiring discussions while not inducing additional overhead from reviewers. Through a large-scale crowdsourced study where crowd workers are recruited as graders, we show that the proposed approach outperforms machine learning or review grades alone and that it can be easily integrated into existing peer review systems.

科学同行评议是保持学术出版质量标准的关键。审查过程的有效性目前正受到各种会议上提交的论文迅速增加的挑战。这些场所需要招募大量具有不同专业知识水平和背景的审稿人。提交的审稿往往不符合会议的一致性标准。这种情况给试图做出最终决定的元审稿人带来了更大的负担。在这项工作中，我们提出了一种人类-人工智能方法来估计评论是否符合会议标准。具体地说，我们要求同行根据评审一致性的重要标准，如充分的理由和客观性，匿名地给彼此的评审打分。我们引入了一个贝叶斯框架，该框架从同行评分过程、历史评估和会议决策中学习评估的一致性，同时考虑了评分的可靠性。我们的方法可以帮助元审稿人轻松地识别需要澄清的审稿，并检测需要讨论的提交，同时不会引起审稿人额外的开销。通过一项大规模的众包研究，在这项研究中，众包工作者被招募为评分者，我们表明，所提出的方法优于机器学习或单独的评分，并且可以很容易地集成到现有的同行评议系统中。

{"title":"Peer Grading the Peer Reviews: A Dual-Role Approach for Lightening the Scholarly Paper Review Process","authors":"Ines Arous, Jie Yang, Mourad Khayati, P. Cudré-Mauroux","doi":"10.1145/3442381.3450088","DOIUrl":"https://doi.org/10.1145/3442381.3450088","url":null,"abstract":"Scientific peer review is pivotal to maintain quality standards for academic publication. The effectiveness of the reviewing process is currently being challenged by the rapid increase of paper submissions in various conferences. Those venues need to recruit a large number of reviewers of different levels of expertise and background. The submitted reviews often do not meet the conformity standards of the conferences. Such a situation poses an ever-bigger burden on the meta-reviewers when trying to reach a final decision. In this work, we propose a human-AI approach that estimates the conformity of reviews to the conference standards. Specifically, we ask peers to grade each other’s reviews anonymously with respect to important criteria of review conformity such as sufficient justification and objectivity. We introduce a Bayesian framework that learns the conformity of reviews from both the peer grading process, historical reviews and decisions of a conference, while taking into account grading reliability. Our approach helps meta-reviewers easily identify reviews that require clarification and detect submissions requiring discussions while not inducing additional overhead from reviewers. Through a large-scale crowdsourced study where crowd workers are recruited as graders, we show that the proposed approach outperforms machine learning or review grades alone and that it can be easily integrated into existing peer review systems.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125958307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

BRIGHT: A Bridging Algorithm for Network Alignment BRIGHT:网络对齐的桥接算法

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450053

Yuchen Yan, Si Zhang, Hanghang Tong

Multiple networks emerge in a wealth of high-impact applications. Network alignment, which aims to find the node correspondence across different networks, plays a fundamental role for many data mining tasks. Most of the existing methods can be divided into two categories: (1) consistency optimization based methods, which often explicitly assume the alignment to be consistent in terms of neighborhood topology and attribute across networks, and (2) network embedding based methods which learn low-dimensional node embedding vectors to infer alignment. In this paper, by analyzing representative methods of these two categories, we show that (1) the consistency optimization based methods are essentially specific random walk propagations from anchor links that might be too restrictive; (2) the embedding based methods no longer explicitly assume alignment consistency but inevitably suffer from the space disparity issue. To overcome these two limitations, we bridge these methods and propose a novel family of network alignment algorithms BRIGHT to handle both plain and attributed networks. Specifically, it constructs a space by random walk with restart (RWR) whose bases are one-hot encoding vectors of anchor nodes, followed by a shared linear layer. Our experiments on real-world networks show that the proposed family of algorithms BRIGHT outperform the state-of-the-arts for both plain and attributed network alignment tasks.

在大量高影响力的应用中出现了多个网络。网络对齐是许多数据挖掘任务的基础，其目的是寻找不同网络之间的节点对应关系。现有的方法大多可以分为两大类:(1)基于一致性优化的方法，该方法通常明确假设跨网络在邻域拓扑和属性方面的对齐是一致的;(2)基于网络嵌入的方法，该方法通过学习低维节点嵌入向量来推断对齐。本文通过对这两类方法的代表性分析，表明:(1)基于一致性优化的方法本质上是锚链接的特定随机游走传播，可能限制太大;(2)基于嵌入的方法不再明确假设对齐一致性，不可避免地存在空间视差问题。为了克服这两个限制，我们将这些方法结合起来，提出了一种新的网络对齐算法BRIGHT，用于处理普通网络和属性网络。具体地说，它通过随机行走重新启动(RWR)构造一个空间，其基是锚节点的单热编码向量，然后是一个共享的线性层。我们在现实网络上的实验表明，所提出的BRIGHT算法家族在普通和归因网络对齐任务方面都优于最先进的算法。

{"title":"BRIGHT: A Bridging Algorithm for Network Alignment","authors":"Yuchen Yan, Si Zhang, Hanghang Tong","doi":"10.1145/3442381.3450053","DOIUrl":"https://doi.org/10.1145/3442381.3450053","url":null,"abstract":"Multiple networks emerge in a wealth of high-impact applications. Network alignment, which aims to find the node correspondence across different networks, plays a fundamental role for many data mining tasks. Most of the existing methods can be divided into two categories: (1) consistency optimization based methods, which often explicitly assume the alignment to be consistent in terms of neighborhood topology and attribute across networks, and (2) network embedding based methods which learn low-dimensional node embedding vectors to infer alignment. In this paper, by analyzing representative methods of these two categories, we show that (1) the consistency optimization based methods are essentially specific random walk propagations from anchor links that might be too restrictive; (2) the embedding based methods no longer explicitly assume alignment consistency but inevitably suffer from the space disparity issue. To overcome these two limitations, we bridge these methods and propose a novel family of network alignment algorithms BRIGHT to handle both plain and attributed networks. Specifically, it constructs a space by random walk with restart (RWR) whose bases are one-hot encoding vectors of anchor nodes, followed by a shared linear layer. Our experiments on real-world networks show that the proposed family of algorithms BRIGHT outperform the state-of-the-arts for both plain and attributed network alignment tasks.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130176985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Generating Accurate Caption Units for Figure Captioning 生成准确的图片标题单位

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449923

Xin Qian, Eunyee Koh, F. Du, Sungchul Kim, Joel Chan, Ryan A. Rossi, Sana Malik, Tak Yeon Lee

Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sound natural would significantly improve figure accessibility. In this paper, we present promising results on machine figure captioning. A recent corpus analysis of real-world captions reveals that machine figure captioning systems should start by generating accurate caption units. We formulate the caption unit generation problem as a controlled captioning problem. Given a caption unit type as a control signal, a model generates an accurate caption unit of that type. As a proof-of-concept on single bar charts, we propose a model, FigJAM, that achieves this goal through utilizing metadata information and a joint static and dynamic dictionary. Quantitative evaluations with two datasets from the figure question answering task show that our model can generate more accurate caption units than competitive baseline models. A user study with ten human experts confirms the value of machine-generated caption units in their standalone accuracy and naturalness. Finally, a post-editing simulation study demonstrates the potential for models to paraphrase and stitch together single-type caption units into multi-type captions by learning from data.

科学风格的图形通常用于网络上表示数字信息。说明文字说明准确的图形信息和声音自然将显著提高图形的可访问性。在本文中，我们在机器图形标注方面取得了可喜的成果。最近对现实世界标题的语料库分析表明，机器图形标题系统应该从生成准确的标题单元开始。我们将标题单元生成问题表述为受控标题问题。给定标题单元类型作为控制信号，模型生成该类型的准确标题单元。作为单条形图的概念验证，我们提出了一个模型FigJAM，它通过利用元数据信息和一个联合的静态和动态字典来实现这一目标。对来自图形问答任务的两个数据集的定量评估表明，我们的模型比竞争对手的基线模型可以生成更准确的标题单元。一项由10位人类专家参与的用户研究证实了机器生成的标题单元在其独立的准确性和自然性方面的价值。最后，一项后期编辑模拟研究表明，通过从数据中学习，模型有可能将单一类型的字幕单元改写并拼接成多类型的字幕。

{"title":"Generating Accurate Caption Units for Figure Captioning","authors":"Xin Qian, Eunyee Koh, F. Du, Sungchul Kim, Joel Chan, Ryan A. Rossi, Sana Malik, Tak Yeon Lee","doi":"10.1145/3442381.3449923","DOIUrl":"https://doi.org/10.1145/3442381.3449923","url":null,"abstract":"Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sound natural would significantly improve figure accessibility. In this paper, we present promising results on machine figure captioning. A recent corpus analysis of real-world captions reveals that machine figure captioning systems should start by generating accurate caption units. We formulate the caption unit generation problem as a controlled captioning problem. Given a caption unit type as a control signal, a model generates an accurate caption unit of that type. As a proof-of-concept on single bar charts, we propose a model, FigJAM, that achieves this goal through utilizing metadata information and a joint static and dynamic dictionary. Quantitative evaluations with two datasets from the figure question answering task show that our model can generate more accurate caption units than competitive baseline models. A user study with ten human experts confirms the value of machine-generated caption units in their standalone accuracy and naturalness. Finally, a post-editing simulation study demonstrates the potential for models to paraphrase and stitch together single-type caption units into multi-type captions by learning from data.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129744040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Assessing the Effects of Friend-to-Friend Texting onTurnout in the 2018 US Midterm Elections 评估朋友间发短信对2018年美国中期选举投票率的影响

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449800

Aaron Schein, Keyon Vafa, Dhanya Sridhar, Victor Veitch, Jeffrey M. Quinn, James Moffet, D. Blei, D. Green

Recent mobile app technology lets people systematize the process of messaging their friends to urge them to vote. Prior to the most recent US midterm elections in 2018, the mobile app Outvote randomized an aspect of their system, hoping to unobtrusively assess the causal effect of their users’ messages on voter turnout. However, properly assessing this causal effect is hindered by multiple statistical challenges, including attenuation bias due to mismeasurement of subjects’ outcomes and low precision due to two-sided non-compliance with subjects’ assignments. We address these challenges, which are likely to impinge upon any study that seeks to randomize authentic friend-to-friend interactions, by tailoring the statistical analysis to make use of additional data about both users and subjects. Using meta-data of users’ in-app behavior, we reconstruct subjects’ positions in users’ queues. We use this information to refine the study population to more compliant subjects who were higher in the queues, and we do so in a systematic way which optimizes a proxy for the study’s power. To mitigate attenuation bias, we then use ancillary data of subjects’ matches to the voter rolls that lets us refine the study population to one with low rates of outcome mismeasurement. Our analysis reveals statistically significant treatment effects from friend-to-friend mobilization efforts ( 8.3, CI = (1.2, 15.3)) that are among the largest reported in the get-out-the-vote (GOTV) literature. While social pressure from friends has long been conjectured to play a role in effective GOTV treatments, the present study is among the first to assess these effects experimentally.

最近的移动应用技术让人们系统化地向朋友发送信息，敦促他们投票。在2018年美国最近一次中期选举之前，移动应用Outvote对其系统的一个方面进行了随机化，希望不引人注意地评估用户信息对选民投票率的因果影响。然而，正确评估这种因果关系受到多种统计挑战的阻碍，包括由于受试者结果测量错误而导致的衰减偏差，以及由于双方不遵守受试者分配而导致的低精度。我们通过剪裁统计分析来利用关于用户和受试者的额外数据来解决这些挑战，这些挑战可能会影响任何试图随机化真实的朋友间互动的研究。利用用户应用内行为的元数据，我们重建了主题在用户队列中的位置。我们使用这些信息来细化研究人群，使其更顺从，排在队列前列的受试者，我们以一种系统的方式来优化研究力量的代理。为了减轻衰减偏差，我们随后使用受试者与选民名册匹配的辅助数据，使我们能够将研究人群细化为结果误判率较低的人群。我们的分析显示，朋友对朋友的动员努力(8.3,CI =(1.2, 15.3))在统计上具有显著的治疗效果，这是在动员投票(GOTV)文献中报道的最大效果之一。虽然来自朋友的社会压力长期以来一直被推测在有效的GOTV治疗中发挥作用，但本研究是首次通过实验评估这些影响的研究之一。

{"title":"Assessing the Effects of Friend-to-Friend Texting onTurnout in the 2018 US Midterm Elections","authors":"Aaron Schein, Keyon Vafa, Dhanya Sridhar, Victor Veitch, Jeffrey M. Quinn, James Moffet, D. Blei, D. Green","doi":"10.1145/3442381.3449800","DOIUrl":"https://doi.org/10.1145/3442381.3449800","url":null,"abstract":"Recent mobile app technology lets people systematize the process of messaging their friends to urge them to vote. Prior to the most recent US midterm elections in 2018, the mobile app Outvote randomized an aspect of their system, hoping to unobtrusively assess the causal effect of their users’ messages on voter turnout. However, properly assessing this causal effect is hindered by multiple statistical challenges, including attenuation bias due to mismeasurement of subjects’ outcomes and low precision due to two-sided non-compliance with subjects’ assignments. We address these challenges, which are likely to impinge upon any study that seeks to randomize authentic friend-to-friend interactions, by tailoring the statistical analysis to make use of additional data about both users and subjects. Using meta-data of users’ in-app behavior, we reconstruct subjects’ positions in users’ queues. We use this information to refine the study population to more compliant subjects who were higher in the queues, and we do so in a systematic way which optimizes a proxy for the study’s power. To mitigate attenuation bias, we then use ancillary data of subjects’ matches to the voter rolls that lets us refine the study population to one with low rates of outcome mismeasurement. Our analysis reveals statistically significant treatment effects from friend-to-friend mobilization efforts ( 8.3, CI = (1.2, 15.3)) that are among the largest reported in the get-out-the-vote (GOTV) literature. While social pressure from friends has long been conjectured to play a role in effective GOTV treatments, the present study is among the first to assess these effects experimentally.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129419920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mixup for Node and Graph Classification 混合节点和图分类

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449796

Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, Bryan Hooi

Mixup is an advanced data augmentation method for training neural network based image classifiers, which interpolates both features and labels of a pair of images to produce synthetic samples. However, devising the Mixup methods for graph learning is challenging due to the irregularity and connectivity of graph data. In this paper, we propose the Mixup methods for two fundamental tasks in graph learning: node and graph classification. To interpolate the irregular graph topology, we propose the two-branch graph convolution to mix the receptive field subgraphs for the paired nodes. Mixup on different node pairs can interfere with the mixed features for each other due to the connectivity between nodes. To block this interference, we propose the two-stage Mixup framework, which uses each node’s neighbors’ representations before Mixup for graph convolutions. For graph classification, we interpolate complex and diverse graphs in the semantic space. Qualitatively, our Mixup methods enable GNNs to learn more discriminative features and reduce over-fitting. Quantitative results show that our method yields consistent gains in terms of test accuracy and F1-micro scores on standard datasets, for both node and graph classification. Overall, our method effectively regularizes popular graph neural networks for better generalization without increasing their time complexity.

Mixup是一种用于训练基于神经网络的图像分类器的高级数据增强方法，它对一对图像的特征和标签进行插值来生成合成样本。然而，由于图数据的不规则性和连通性，设计用于图学习的Mixup方法具有挑战性。在本文中，我们针对图学习中的两个基本任务:节点和图分类提出了Mixup方法。为了插值不规则图拓扑，我们提出了双分支图卷积来混合成对节点的接受域子图。由于节点间的连通性，不同节点对上的混合会对混合特征产生相互干扰。为了阻止这种干扰，我们提出了两阶段的Mixup框架，该框架在Mixup之前使用每个节点的邻居表示进行图卷积。对于图的分类，我们在语义空间内插入复杂和多样的图。从质量上讲，我们的Mixup方法使gnn能够学习更多的判别特征并减少过拟合。定量结果表明，对于节点和图分类，我们的方法在标准数据集上的测试精度和F1-micro分数方面取得了一致的收益。总的来说，我们的方法在不增加时间复杂度的情况下有效地正则化了流行的图神经网络，以获得更好的泛化效果。

{"title":"Mixup for Node and Graph Classification","authors":"Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, Bryan Hooi","doi":"10.1145/3442381.3449796","DOIUrl":"https://doi.org/10.1145/3442381.3449796","url":null,"abstract":"Mixup is an advanced data augmentation method for training neural network based image classifiers, which interpolates both features and labels of a pair of images to produce synthetic samples. However, devising the Mixup methods for graph learning is challenging due to the irregularity and connectivity of graph data. In this paper, we propose the Mixup methods for two fundamental tasks in graph learning: node and graph classification. To interpolate the irregular graph topology, we propose the two-branch graph convolution to mix the receptive field subgraphs for the paired nodes. Mixup on different node pairs can interfere with the mixed features for each other due to the connectivity between nodes. To block this interference, we propose the two-stage Mixup framework, which uses each node’s neighbors’ representations before Mixup for graph convolutions. For graph classification, we interpolate complex and diverse graphs in the semantic space. Qualitatively, our Mixup methods enable GNNs to learn more discriminative features and reduce over-fitting. Quantitative results show that our method yields consistent gains in terms of test accuracy and F1-micro scores on standard datasets, for both node and graph classification. Overall, our method effectively regularizes popular graph neural networks for better generalization without increasing their time complexity.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121640563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106