首页 > 最新文献

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)最新文献

英文 中文
Spelling Correction using Phonetics in E-commerce Search 电子商务搜索中使用语音的拼写纠正
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.9
Fan Yang, Alireza Bagheri Garakani, Yifei Teng, Yanling Gao, Jia Liu, Jingyuan Deng, Yi Sun
In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries. However, resolving phonetic errors is a critical but much overlooked area. The query with phonetic spelling errors tends to appear correct based on pronunciation but is nonetheless inaccurate in spelling (e.g., “bluetooth sound system” vs. “blutut sant sistam”) with numerous noisy forms and sparse occurrences. In this work, we propose a generalized spelling correction system integrating phonetics to address phonetic errors in E-commerce search without additional latency cost. Using India (IN) E-commerce market for illustration, the experiment shows that our proposed phonetic solution significantly improves the F1 score by 9%+ and recall of phonetic errors by 8%+. This phonetic spelling correction system has been deployed to production, currently serving hundreds of millions of customers.
在电子商务搜索中,在处理用户输入的搜索查询时,拼写校正对于为客户找到所需的产品起着重要的作用。然而,语音错误的解决是一个非常重要但又容易被忽视的领域。有语音拼写错误的查询往往在发音上看起来是正确的,但在拼写上却不准确(例如,“bluetooth sound system”与“blutut sant sistam”),因为有许多嘈杂的形式和稀疏的出现。在这项工作中,我们提出了一个集成语音的通用拼写纠正系统,以解决电子商务搜索中的语音错误,而不增加延迟成本。以印度(IN)电子商务市场为例,实验表明,我们提出的语音解决方案显著提高了F1分数9%+,语音错误召回率8%+。该语音拼写校正系统已部署到生产中,目前服务于数亿客户。
{"title":"Spelling Correction using Phonetics in E-commerce Search","authors":"Fan Yang, Alireza Bagheri Garakani, Yifei Teng, Yanling Gao, Jia Liu, Jingyuan Deng, Yi Sun","doi":"10.18653/v1/2022.ecnlp-1.9","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.9","url":null,"abstract":"In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries. However, resolving phonetic errors is a critical but much overlooked area. The query with phonetic spelling errors tends to appear correct based on pronunciation but is nonetheless inaccurate in spelling (e.g., “bluetooth sound system” vs. “blutut sant sistam”) with numerous noisy forms and sparse occurrences. In this work, we propose a generalized spelling correction system integrating phonetics to address phonetic errors in E-commerce search without additional latency cost. Using India (IN) E-commerce market for illustration, the experiment shows that our proposed phonetic solution significantly improves the F1 score by 9%+ and recall of phonetic errors by 8%+. This phonetic spelling correction system has been deployed to production, currently serving hundreds of millions of customers.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121876270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost CML:一种估算人类标签置信度分数和降低数据收集成本的对比元学习方法
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.5
B. Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, Zheng Du
Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.
深度神经网络模型特别容易受到标注标签中的噪声的影响。在现实世界中,注释过的数据通常包含由各种因素引起的噪声,例如任务难度、注释者经验和注释者偏见。标签质量对标签验证任务至关重要;然而,通过收集更多的数据来纠正噪声通常是昂贵的。在本文中,我们提出了一个对比元学习框架(CML)来解决噪声注释数据带来的挑战,特别是在自然语言处理的背景下。CML结合了对比学习和元学习来提高文本特征表示的质量。元学习也用于生成信心分数来评估标签质量。我们证明了基于cml过滤数据构建的模型优于基于干净数据构建的模型。此外,我们在去识别的商业语音助手数据集上进行了实验,并证明我们的模型优于几种SOTA方法。
{"title":"CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost","authors":"B. Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, Zheng Du","doi":"10.18653/v1/2022.ecnlp-1.5","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.5","url":null,"abstract":"Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127001111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Interactive Latent Knowledge Selection for E-Commerce Product Copywriting Generation 电子商务产品文案生成的交互隐性知识选择
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.2
Zeming Wang, Yanyan Zou, Yuejian Fang, Hongshen Chen, Mian Ma, Zhuoye Ding, Bo Long
As the multi-modal e-commerce is thriving, high-quality advertising product copywriting has gain more attentions, which plays a crucial role in the e-commerce recommender, advertising and even search platforms.The advertising product copywriting is able to enhance the user experience by highlighting the product’s characteristics with textual descriptions and thus to improve the likelihood of user click and purchase. Automatically generating product copywriting has attracted noticeable interests from both academic and industrial communities, where existing solutions merely make use of a product’s title and attribute information to generate its corresponding description.However, in addition to the product title and attributes, we observe that there are various auxiliary descriptions created by the shoppers or marketers in the e-commerce platforms (namely human knowledge), which contains valuable information for product copywriting generation, yet always accompanying lots of noises.In this work, we propose a novel solution to automatically generating product copywriting that involves all the title, attributes and denoised auxiliary knowledge.To be specific, we design an end-to-end generation framework equipped with two variational autoencoders that works interactively to select informative human knowledge and generate diverse copywriting.
随着多模式电子商务的蓬勃发展,高质量的广告产品文案越来越受到人们的关注,在电子商务的推荐、广告甚至搜索平台中发挥着至关重要的作用。广告产品文案可以通过文字描述突出产品特点,增强用户体验,从而提高用户点击和购买的可能性。自动生成产品文案吸引了学术界和工业界的注意,现有的解决方案只是利用产品的标题和属性信息来生成相应的描述。然而,我们观察到,除了产品的标题和属性之外,电子商务平台上还有消费者或营销人员创造的各种辅助描述(即人类知识),这些描述包含了对产品文案生成有价值的信息,但总是伴随着大量的噪音。在这项工作中,我们提出了一种新的解决方案来自动生成包含所有标题,属性和去噪辅助知识的产品文案。具体来说,我们设计了一个端到端生成框架,配备了两个可变自动编码器,它们可以交互地选择信息丰富的人类知识并生成多样化的文案。
{"title":"Interactive Latent Knowledge Selection for E-Commerce Product Copywriting Generation","authors":"Zeming Wang, Yanyan Zou, Yuejian Fang, Hongshen Chen, Mian Ma, Zhuoye Ding, Bo Long","doi":"10.18653/v1/2022.ecnlp-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.2","url":null,"abstract":"As the multi-modal e-commerce is thriving, high-quality advertising product copywriting has gain more attentions, which plays a crucial role in the e-commerce recommender, advertising and even search platforms.The advertising product copywriting is able to enhance the user experience by highlighting the product’s characteristics with textual descriptions and thus to improve the likelihood of user click and purchase. Automatically generating product copywriting has attracted noticeable interests from both academic and industrial communities, where existing solutions merely make use of a product’s title and attribute information to generate its corresponding description.However, in addition to the product title and attributes, we observe that there are various auxiliary descriptions created by the shoppers or marketers in the e-commerce platforms (namely human knowledge), which contains valuable information for product copywriting generation, yet always accompanying lots of noises.In this work, we propose a novel solution to automatically generating product copywriting that involves all the title, attributes and denoised auxiliary knowledge.To be specific, we design an end-to-end generation framework equipped with two variational autoencoders that works interactively to select informative human knowledge and generate diverse copywriting.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129359739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lot or Not: Identifying Multi-Quantity Offerings in E-Commerce 批量与否:电子商务中多数量产品的识别
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.29
G. Lavee, Ido Guy
The term lot in is defined to mean an offering that contains a collection of multiple identical items for sale. In a large online marketplace, lot offerings play an important role, allowing buyers and sellers to set price levels to optimally balance supply and demand needs. In spite of their central role, platforms often struggle to identify lot offerings, since explicit lot status identification is frequently not provided by sellers. The ability to identify lot offerings plays a key role in many fundamental tasks, from matching offerings to catalog products, through ranking search results, to providing effective pricing guidance. In this work, we seek to determine the lot status (and lot size) of each offering, in order to facilitate an improved buyer experience, while reducing the friction for sellers posting new offerings. We demonstrate experimentally the ability to accurately classify offerings as lots and predict their lot size using only the offer title, by adapting state-of-the-art natural language techniques to the lot identification problem.
术语lot的定义是指包含多个相同物品的拍卖集合。在大型在线市场中,批货服务发挥着重要作用,允许买家和卖家设定价格水平,以最佳地平衡供需需求。尽管平台发挥着核心作用,但它们往往难以识别拍卖品,因为卖家往往不提供明确的拍卖品状态标识。识别批量产品的能力在许多基本任务中起着关键作用,从匹配产品到目录产品,通过搜索结果排序,到提供有效的定价指导。在这项工作中,我们试图确定每个报价的批次状态(和批次大小),以促进改善买方体验,同时减少卖方发布新报价的摩擦。我们通过实验证明了通过将最先进的自然语言技术应用于批次识别问题,准确地将产品分类为批次并仅使用报价标题预测其批次大小的能力。
{"title":"Lot or Not: Identifying Multi-Quantity Offerings in E-Commerce","authors":"G. Lavee, Ido Guy","doi":"10.18653/v1/2022.ecnlp-1.29","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.29","url":null,"abstract":"The term lot in is defined to mean an offering that contains a collection of multiple identical items for sale. In a large online marketplace, lot offerings play an important role, allowing buyers and sellers to set price levels to optimally balance supply and demand needs. In spite of their central role, platforms often struggle to identify lot offerings, since explicit lot status identification is frequently not provided by sellers. The ability to identify lot offerings plays a key role in many fundamental tasks, from matching offerings to catalog products, through ranking search results, to providing effective pricing guidance. In this work, we seek to determine the lot status (and lot size) of each offering, in order to facilitate an improved buyer experience, while reducing the friction for sellers posting new offerings. We demonstrate experimentally the ability to accurately classify offerings as lots and predict their lot size using only the offer title, by adapting state-of-the-art natural language techniques to the lot identification problem.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131007564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Textual Content Moderation in C2C Marketplace C2C市场中的文本内容审核
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.8
Yusuke Shido, Hsien-Chi Liu, Keisuke Umezawa
Automatic monitoring systems for inappropriate user-generated messages have been found to be effective in reducing human operation costs in Consumer to Consumer (C2C) marketplace services, in which customers send messages directly to other customers.We propose a lightweight neural network that takes a conversation as input, which we deployed to a production service.Our results show that the system reduced the human operation costs to less than one-sixth compared to the conventional rule-based monitoring at Mercari.
在消费者对消费者(C2C)市场服务中,客户直接向其他客户发送消息,发现针对不适当的用户生成消息的自动监控系统可以有效地降低人工操作成本。我们提出了一个轻量级的神经网络,它将对话作为输入,并将其部署到生产服务中。我们的研究结果表明,与Mercari传统的基于规则的监测相比,该系统将人工操作成本降低了不到六分之一。
{"title":"Textual Content Moderation in C2C Marketplace","authors":"Yusuke Shido, Hsien-Chi Liu, Keisuke Umezawa","doi":"10.18653/v1/2022.ecnlp-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.8","url":null,"abstract":"Automatic monitoring systems for inappropriate user-generated messages have been found to be effective in reducing human operation costs in Consumer to Consumer (C2C) marketplace services, in which customers send messages directly to other customers.We propose a lightweight neural network that takes a conversation as input, which we deployed to a production service.Our results show that the system reduced the human operation costs to less than one-sixth compared to the conventional rule-based monitoring at Mercari.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123092944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction 基于标签掩蔽的产品属性值提取极端多标签分类
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.16
Wei-Te Chen, Yandi Xia, Keiji Shinzato
Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F_1 scores by 3.38 and 23.20 points, respectively.
尽管大多数研究都将属性值提取(AVE)视为命名实体识别,但这些方法在现实世界的电子商务平台中并不实用,因为它们表现不佳,并且需要对提取的值进行规范化。此外,由于实际服务所需的值在许多属性中是静态的,因此并不总是需要提取新值。鉴于上述情况,我们将AVE形式化为极端多标签分类(XMC)。解决AVE作为XMC的一个主要问题是产品的正负标签之间的分布严重不平衡。为了减轻这种偏差分布带来的负面影响,我们提出了一种简单有效的方法来减少训练中负面标签的数量。我们利用为电子商务平台设计的属性分类法来确定哪些标签对产品是负面的。基于日本某电商平台数据集的实验结果表明,标签掩蔽使F_1的微观和宏观得分分别提高了3.38分和23.20分。
{"title":"Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction","authors":"Wei-Te Chen, Yandi Xia, Keiji Shinzato","doi":"10.18653/v1/2022.ecnlp-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.16","url":null,"abstract":"Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F_1 scores by 3.38 and 23.20 points, respectively.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124625209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving Relevance Quality in Product Search using High-Precision Query-Product Semantic Similarity 利用高精度查询-产品语义相似度提高产品搜索的相关质量
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.6
Alireza Bagheri Garakani, Fan Yang, Wen-Yu Hua, Yetian Chen, Michinari Momma, Jingyuan Deng, Yanling Gao, Yi Sun
Ensuring relevance quality in product search is a critical task as it impacts the customer’s ability to find intended products in the short-term as well as the general perception and trust of the e-commerce system in the long term. In this work we leverage a high-precision cross-encoder BERT model for semantic similarity between customer query and products and survey its effectiveness for three ranking applications where offline-generated scores could be used: (1) as an offline metric for estimating relevance quality impact, (2) as a re-ranking feature covering head/torso queries, and (3) as a training objective for optimization. We present results on effectiveness of this strategy for the large e-commerce setting, which has general applicability for choice of other high-precision models and tasks in ranking.
确保产品搜索的相关性质量是一项关键任务,因为它会影响客户在短期内找到预期产品的能力,以及长期对电子商务系统的总体感知和信任。在这项工作中,我们利用高精度的交叉编码器BERT模型来处理客户查询和产品之间的语义相似性,并调查其在三种排名应用程序中的有效性,其中离线生成的分数可用于:(1)作为估计相关质量影响的离线度量,(2)作为覆盖头部/躯干查询的重新排名特征,以及(3)作为优化的训练目标。我们给出了该策略在大型电子商务环境中的有效性的结果,该策略对其他高精度模型和排序任务的选择具有普遍的适用性。
{"title":"Improving Relevance Quality in Product Search using High-Precision Query-Product Semantic Similarity","authors":"Alireza Bagheri Garakani, Fan Yang, Wen-Yu Hua, Yetian Chen, Michinari Momma, Jingyuan Deng, Yanling Gao, Yi Sun","doi":"10.18653/v1/2022.ecnlp-1.6","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.6","url":null,"abstract":"Ensuring relevance quality in product search is a critical task as it impacts the customer’s ability to find intended products in the short-term as well as the general perception and trust of the e-commerce system in the long term. In this work we leverage a high-precision cross-encoder BERT model for semantic similarity between customer query and products and survey its effectiveness for three ranking applications where offline-generated scores could be used: (1) as an offline metric for estimating relevance quality impact, (2) as a re-ranking feature covering head/torso queries, and (3) as a training objective for optimization. We present results on effectiveness of this strategy for the large e-commerce setting, which has general applicability for choice of other high-precision models and tasks in ranking.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122333724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging Seq2seq Language Generation for Multi-level Product Issue Identification 利用Seq2seq语言生成多层次产品问题识别
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.3
Yang Liu, Varnith Chordia, Hua Li, Siavash Fazeli Dehkordy, Yifei Sun, Vincent Gao, Na Zhang
In a leading e-commerce business, we receive hundreds of millions of customer feedback from different text communication channels such as product reviews. The feedback can contain rich information regarding customers’ dissatisfaction in the quality of goods and services. To harness such information to better serve customers, in this paper, we created a machine learning approach to automatically identify product issues and uncover root causes from the customer feedback text. We identify issues at two levels: coarse grained (L-Coarse) and fine grained (L-Granular). We formulate this multi-level product issue identification problem as a seq2seq language generation problem. Specifically, we utilize transformer-based seq2seq models due to their versatility and strong transfer-learning capability. We demonstrate that our approach is label efficient and outperforms the traditional approach such as multi-class multi-label classification formulation. Based on human evaluation, our fine-tuned model achieves 82.1% and 95.4% human-level performance for L-Coarse and L-Granular issue identification, respectively. Furthermore, our experiments illustrate that the model can generalize to identify unseen L-Granular issues.
在领先的电子商务业务中,我们从不同的文字交流渠道(如产品评论)收到数以亿计的客户反馈。反馈可以包含丰富的信息,关于客户对商品和服务质量的不满。为了利用这些信息更好地为客户服务,在本文中,我们创建了一种机器学习方法来自动识别产品问题,并从客户反馈文本中发现根本原因。我们在两个层次上识别问题:粗粒度(L-Coarse)和细粒度(L-Granular)。我们将这种多层次产品问题识别问题表述为seq2seq语言生成问题。具体来说,我们使用基于变压器的seq2seq模型,因为它们具有通用性和强大的迁移学习能力。结果表明,该方法具有较高的标签效率,优于传统的多类多标签分类方法。基于人类的评估,我们的微调模型在l -粗和l -细粒度问题识别上分别达到了82.1%和95.4%的人类水平。此外,我们的实验表明,该模型可以推广到识别看不见的l -粒度问题。
{"title":"Leveraging Seq2seq Language Generation for Multi-level Product Issue Identification","authors":"Yang Liu, Varnith Chordia, Hua Li, Siavash Fazeli Dehkordy, Yifei Sun, Vincent Gao, Na Zhang","doi":"10.18653/v1/2022.ecnlp-1.3","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.3","url":null,"abstract":"In a leading e-commerce business, we receive hundreds of millions of customer feedback from different text communication channels such as product reviews. The feedback can contain rich information regarding customers’ dissatisfaction in the quality of goods and services. To harness such information to better serve customers, in this paper, we created a machine learning approach to automatically identify product issues and uncover root causes from the customer feedback text. We identify issues at two levels: coarse grained (L-Coarse) and fine grained (L-Granular). We formulate this multi-level product issue identification problem as a seq2seq language generation problem. Specifically, we utilize transformer-based seq2seq models due to their versatility and strong transfer-learning capability. We demonstrate that our approach is label efficient and outperforms the traditional approach such as multi-class multi-label classification formulation. Based on human evaluation, our fine-tuned model achieves 82.1% and 95.4% human-level performance for L-Coarse and L-Granular issue identification, respectively. Furthermore, our experiments illustrate that the model can generalize to identify unseen L-Granular issues.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Generative Approach for Question Answering in E-Commerce 电子商务中生成式问答方法研究
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.24
Kalyani Roy, Vineeth Balapanuru, Tapas Nayak, Pawan Goyal
Many e-commerce websites provide Product-related Question Answering (PQA) platform where potential customers can ask questions related to a product, and other consumers can post an answer to that question based on their experience. Recently, there has been a growing interest in providing automated responses to product questions. In this paper, we investigate the suitability of the generative approach for PQA. We use state-of-the-art generative models proposed by Deng et al.(2020) and Lu et al.(2020) for this purpose. On closer examination, we find several drawbacks in this approach: (1) input reviews are not always utilized significantly for answer generation, (2) the performance of the models is abysmal while answering the numerical questions, (3) many of the generated answers contain phrases like “I do not know” which are taken from the reference answer in training data, and these answers do not convey any information to the customer. Although these approaches achieve a high ROUGE score, it does not reflect upon these shortcomings of the generated answers. We hope that our analysis will lead to more rigorous PQA approaches, and future research will focus on addressing these shortcomings in PQA.
许多电子商务网站提供与产品相关的问题回答(PQA)平台,潜在客户可以提出与产品相关的问题,其他消费者可以根据他们的经验发布该问题的答案。最近,人们对为产品问题提供自动响应越来越感兴趣。在本文中,我们研究了生成方法对PQA的适用性。为此,我们使用了Deng等人(2020)和Lu等人(2020)提出的最先进的生成模型。仔细检查后,我们发现这种方法有几个缺点:(1)输入审查并不总是用于答案生成,(2)模型在回答数值问题时的性能非常糟糕,(3)许多生成的答案包含“我不知道”这样的短语,这些短语取自训练数据中的参考答案,并且这些答案没有向客户传达任何信息。虽然这些方法获得了很高的ROUGE分数,但它并没有反映出生成答案的这些缺点。我们希望我们的分析将导致更严格的PQA方法,未来的研究将集中在解决PQA中的这些缺点。
{"title":"Investigating the Generative Approach for Question Answering in E-Commerce","authors":"Kalyani Roy, Vineeth Balapanuru, Tapas Nayak, Pawan Goyal","doi":"10.18653/v1/2022.ecnlp-1.24","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.24","url":null,"abstract":"Many e-commerce websites provide Product-related Question Answering (PQA) platform where potential customers can ask questions related to a product, and other consumers can post an answer to that question based on their experience. Recently, there has been a growing interest in providing automated responses to product questions. In this paper, we investigate the suitability of the generative approach for PQA. We use state-of-the-art generative models proposed by Deng et al.(2020) and Lu et al.(2020) for this purpose. On closer examination, we find several drawbacks in this approach: (1) input reviews are not always utilized significantly for answer generation, (2) the performance of the models is abysmal while answering the numerical questions, (3) many of the generated answers contain phrases like “I do not know” which are taken from the reference answer in training data, and these answers do not convey any information to the customer. Although these approaches achieve a high ROUGE score, it does not reflect upon these shortcomings of the generated answers. We hope that our analysis will lead to more rigorous PQA approaches, and future research will focus on addressing these shortcomings in PQA.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133646768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhanced Representation with Contrastive Loss for Long-Tail Query Classification in e-commerce 基于对比损失的电子商务长尾查询分类增强表示
Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.ecnlp-1.17
Lvxing Zhu, Hao Chen, Chao Wei, Weiru Zhang
Query classification is a fundamental task in an e-commerce search engine, which assigns one or multiple predefined product categories in response to each search query. Taking click-through logs as training data in deep learning methods is a common and effective approach for query classification. However, the frequency distribution of queries typically has long-tail property, which means that there are few logs for most of the queries. The lack of reliable user feedback information results in worse performance of long-tail queries compared with frequent queries. To solve the above problem, we propose a novel method that leverages an auxiliary module to enhance the representations of long-tail queries by taking advantage of reliable supervised information of variant frequent queries. The long-tail queries are guided by the contrastive loss to obtain category-aligned representations in the auxiliary module, where the variant frequent queries serve as anchors in the representation space. We train our model with real-world click data from AliExpress and conduct evaluation on both offline labeled data and online AB test. The results and further analysis demonstrate the effectiveness of our proposed method.
查询分类是电子商务搜索引擎中的一项基本任务,它为每个搜索查询分配一个或多个预定义的产品类别。在深度学习方法中,将点击率日志作为训练数据是一种常见且有效的查询分类方法。然而,查询的频率分布通常具有长尾属性,这意味着大多数查询的日志很少。由于缺乏可靠的用户反馈信息,导致长尾查询的性能比频繁查询差。为了解决上述问题,我们提出了一种新的方法,利用辅助模块利用可变频繁查询的可靠监督信息来增强长尾查询的表示。长尾查询由对比损失指导,以在辅助模块中获得与类别对齐的表示,其中变体频繁查询充当表示空间中的锚点。我们使用全球速卖通的真实点击数据来训练我们的模型,并对离线标记数据和在线AB测试进行评估。结果和进一步的分析证明了我们所提出的方法的有效性。
{"title":"Enhanced Representation with Contrastive Loss for Long-Tail Query Classification in e-commerce","authors":"Lvxing Zhu, Hao Chen, Chao Wei, Weiru Zhang","doi":"10.18653/v1/2022.ecnlp-1.17","DOIUrl":"https://doi.org/10.18653/v1/2022.ecnlp-1.17","url":null,"abstract":"Query classification is a fundamental task in an e-commerce search engine, which assigns one or multiple predefined product categories in response to each search query. Taking click-through logs as training data in deep learning methods is a common and effective approach for query classification. However, the frequency distribution of queries typically has long-tail property, which means that there are few logs for most of the queries. The lack of reliable user feedback information results in worse performance of long-tail queries compared with frequent queries. To solve the above problem, we propose a novel method that leverages an auxiliary module to enhance the representations of long-tail queries by taking advantage of reliable supervised information of variant frequent queries. The long-tail queries are guided by the contrastive loss to obtain category-aligned representations in the auxiliary module, where the variant frequent queries serve as anchors in the representation space. We train our model with real-world click data from AliExpress and conduct evaluation on both offline labeled data and online AB test. The results and further analysis demonstrate the effectiveness of our proposed method.","PeriodicalId":384006,"journal":{"name":"Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126953141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1