GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma
{"title":"GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification","authors":"Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma","doi":"10.1145/3442381.3449937","DOIUrl":null,"url":null,"abstract":"This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"71 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于标签关注的极端分类图神经网络
本文开发了用于极端分类的GalaXC算法,该算法的任务是从一个极大的标签集中使用最相关的标签子集来注释文档。极端分类已经成功地应用于几个现实世界的网络规模应用,如网络搜索、产品推荐、查询重写等。GalaXC发现了两个主要极端分类算法的关键缺陷。首先,现有的方法通常假设文档和标签位于不相交的集合中,即使在几个应用程序中,标签和文档共存于同一空间。第二,有几种方法虽然是可伸缩的,但不利用应用程序提供的各种形式的元数据,如标签文本和标签相关性。为了解决这些问题,GalaXC提出了一个框架,可以在大规模的联合文档标签图上进行协作学习,以一种自然允许各种辅助信息源(包括标签元数据)被合并的方式。GalaXC还引入了一种新颖的标签注意机制,将高容量极端分类器与其框架融合在一起。提出了一种有效的端到端GalaXC实现,在4 × V100 gpu上,可以在不到100小时的时间内对具有50M个标签和97M个训练文档的数据集进行训练。这使得GalaXC不仅可以扩展到具有数百万个标签的应用程序,而且比领先的深度极端分类器的准确率提高了18%,同时训练速度提高了2-50倍,在基准数据集上的预测速度提高了10倍。GalaXC特别适合热启动场景,在这种场景中,需要对部分显示标签集的数据点进行预测,并且被发现比专门为热启动设置设计的极端分类算法准确率高出25%。在Bing搜索引擎上进行的A/B测试中,GalaXC可以将点击率(Click Yield, CY)和覆盖率分别提高1.52%和1.11%。GalaXC的代码可在https://github.com/Extreme-classification/GalaXC上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service Outlier-Resilient Web Service QoS Prediction Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy Unsupervised Lifelong Learning with Curricula The Structure of Toxic Conversations on Twitter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1