Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment

Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu
{"title":"Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment","authors":"Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu","doi":"10.1109/ICKG52313.2021.00043","DOIUrl":null,"url":null,"abstract":"Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKG52313.2021.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于属性相似度和相关性的目标目录丰富产品模式匹配
许多电子商务目录依赖于结构化的产品数据来为客户提供良好的体验。对于大规模服务,产品信息由数百万个不同的制造商和供应商模式提供。由于这些数据固有的异构性,将其统一到一致的目录模式仍然是一个挑战。模式匹配是在不同的分布式异构数据源中找到概念之间的对应关系的问题。自动化模式匹配中的大多数方法都假设有少量的源模式、属性和上下文(例如,匹配来自媒体知识库的电影属性)。相比之下,产品目录中的模式匹配遇到了跨数百万个嘈杂的异构模式进行扩展的问题,这些模式跨越数千个类别和属性。在本文中,我们引入了一个可扩展的模式匹配框架,该框架利用无监督的特定于领域的属性表示和通用的属性相似度度量。我们的方法首先根据现有的客户信号识别给定产品的相关属性,然后在候选属性中确定优先级,仅合并来自多个制造商和供应商的相关产品事实,几乎没有标记数据。我们通过实验证明了它的价值,这些实验丰富了包含数百万个属性枚举的目录数据,这些属性枚举来自广泛产品类别中的数万个模式。实验结果表明,通过对目标产品事实进行自动化模式匹配,可以减少75%的手动注释工作,从而提高有助于客户兴趣的重要属性的准确性、精确度和召回率。我们还证明,与两种成熟的无监督模式匹配方法相比,使用我们的方法可以提高8%的MRR性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Genetic Algorithm for Residual Static Correction A Robust Mathematical Model for Blood Supply Chain Network using Game Theory Divide and Conquer: Targeted Adversary Detection using Proximity and Dependency A divide-and-conquer method for computing preferred extensions of argumentation frameworks An efficient framework for sentence similarity inspired by quantum computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1