首页 > 最新文献

Proceedings of The 4th Workshop on e-Commerce and NLP最新文献

英文 中文
Product Review Translation: Parallel Corpus Creation and Robustness towards User-generated Noisy Text 产品评论翻译:平行语料库创建和对用户生成的噪声文本的鲁棒性
Pub Date : 2021-08-01 DOI: 10.18653/v1/2021.ecnlp-1.21
Kamal Kumar Gupta, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal
Reviews written by the users for a particular product or service play an influencing role for the customers to make an informative decision. Although online e-commerce portals have immensely impacted our lives, available contents predominantly are in English language- often limiting its widespread usage. There is an exponential growth in the number of e-commerce users who are not proficient in English. Hence, there is a necessity to make these services available in non-English languages, especially in a multilingual country like India. This can be achieved by an in-domain robust machine translation (MT) system. However, the reviews written by the users pose unique challenges to MT, such as misspelled words, ungrammatical constructions, presence of colloquial terms, lack of resources such as in-domain parallel corpus etc. We address the above challenges by presenting an English–Hindi review domain parallel corpus. We train an English–to–Hindi neural machine translation (NMT) system to translate the product reviews available on e-commerce websites. By training the Transformer based NMT model over the generated data, we achieve a score of 33.26 BLEU points for English–to–Hindi translation. In order to make our NMT model robust enough to handle the noisy tokens in the reviews, we integrate a character based language model to generate word vectors and map the noisy tokens with their correct forms. Experiments on four language pairs, viz. English-Hindi, English-German, English-French, and English-Czech show the BLUE scores of 35.09, 28.91, 34.68 and 14.52 which are the improvements of 1.61, 1.05, 1.63 and 1.94, respectively, over the baseline.
用户对特定产品或服务的评论对客户做出信息决策起着影响作用。尽管在线电子商务门户网站极大地影响了我们的生活,但可用的内容主要是英语,这往往限制了英语的广泛使用。不精通英语的电子商务用户数量呈指数级增长。因此,有必要以非英语语言提供这些服务,特别是在印度这样的多语言国家。这可以通过域内鲁棒机器翻译(MT)系统来实现。然而,用户所写的评论给机器翻译带来了独特的挑战,如拼写错误,不符合语法结构,口语化术语的存在,缺乏资源,如领域内平行语料库等。我们通过提出一个英语-印地语评论领域平行语料库来解决上述挑战。我们训练了一个英语到印地语的神经机器翻译(NMT)系统来翻译电子商务网站上的产品评论。通过在生成的数据上训练基于Transformer的NMT模型,我们实现了英语到印地语翻译的33.26 BLEU分。为了使我们的NMT模型具有足够的鲁棒性来处理评论中的噪声标记,我们集成了一个基于字符的语言模型来生成词向量,并将噪声标记映射为正确的形式。在英语-印地语、英语-德语、英语-法语和英语-捷克语四个语言对的实验中,BLUE得分分别为35.09、28.91、34.68和14.52,分别比基线提高了1.61、1.05、1.63和1.94。
{"title":"Product Review Translation: Parallel Corpus Creation and Robustness towards User-generated Noisy Text","authors":"Kamal Kumar Gupta, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal","doi":"10.18653/v1/2021.ecnlp-1.21","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.21","url":null,"abstract":"Reviews written by the users for a particular product or service play an influencing role for the customers to make an informative decision. Although online e-commerce portals have immensely impacted our lives, available contents predominantly are in English language- often limiting its widespread usage. There is an exponential growth in the number of e-commerce users who are not proficient in English. Hence, there is a necessity to make these services available in non-English languages, especially in a multilingual country like India. This can be achieved by an in-domain robust machine translation (MT) system. However, the reviews written by the users pose unique challenges to MT, such as misspelled words, ungrammatical constructions, presence of colloquial terms, lack of resources such as in-domain parallel corpus etc. We address the above challenges by presenting an English–Hindi review domain parallel corpus. We train an English–to–Hindi neural machine translation (NMT) system to translate the product reviews available on e-commerce websites. By training the Transformer based NMT model over the generated data, we achieve a score of 33.26 BLEU points for English–to–Hindi translation. In order to make our NMT model robust enough to handle the noisy tokens in the reviews, we integrate a character based language model to generate word vectors and map the noisy tokens with their correct forms. Experiments on four language pairs, viz. English-Hindi, English-German, English-French, and English-Czech show the BLUE scores of 35.09, 28.91, 34.68 and 14.52 which are the improvements of 1.61, 1.05, 1.63 and 1.94, respectively, over the baseline.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131661099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Combining semantic search and twin product classification for recognition of purchasable items in voice shopping 结合语义搜索和双产品分类识别语音购物中可购买物品
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.18
Dieu-Thu Le, Verena Weber, Melanie Bradford
The accuracy of an online shopping system via voice commands is particularly important and may have a great impact on customer trust. This paper focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent in a typical Spoken Language Understanding architecture consist- ing of an intent classifier and a slot detec- tor. Searching through billions of products to check if a detected slot is a purchasable item is prohibitively expensive. To overcome this problem, we present a framework that (1) uses a retrieval module that returns the most rele- vant products with respect to the detected slot, and (2) combines it with a twin network that decides if the detected slot is indeed a pur- chasable item or not. Through various exper- iments, we show that this architecture outper- forms a typical slot detector approach, with a gain of +81% in accuracy and +41% in F1 score.
通过语音命令的网上购物系统的准确性尤为重要,可能对客户信任产生很大影响。本文主要研究在一个由意图分类器和槽检测器组成的典型的口语理解体系结构中,如何检测一个话语是否包含实际的和可购买的产品,从而引用与购物相关的意图。在数十亿的产品中进行搜索,以检查检测到的插槽是否是可购买的物品,这是一项非常昂贵的工作。为了克服这个问题,我们提出了一个框架:(1)使用一个检索模块来返回与检测到的插槽最相关的产品,(2)将其与一个孪生网络相结合,该网络决定检测到的插槽是否确实是一个可购买的项目。通过各种实验,我们表明该架构优于典型的槽检测器方法,精度增益为+81%,F1分数为+41%。
{"title":"Combining semantic search and twin product classification for recognition of purchasable items in voice shopping","authors":"Dieu-Thu Le, Verena Weber, Melanie Bradford","doi":"10.18653/v1/2021.ecnlp-1.18","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.18","url":null,"abstract":"The accuracy of an online shopping system via voice commands is particularly important and may have a great impact on customer trust. This paper focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent in a typical Spoken Language Understanding architecture consist- ing of an intent classifier and a slot detec- tor. Searching through billions of products to check if a detected slot is a purchasable item is prohibitively expensive. To overcome this problem, we present a framework that (1) uses a retrieval module that returns the most rele- vant products with respect to the detected slot, and (2) combines it with a twin network that decides if the detected slot is indeed a pur- chasable item or not. Through various exper- iments, we show that this architecture outper- forms a typical slot detector approach, with a gain of +81% in accuracy and +41% in F1 score.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116778767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute Value Generation from Product Title using Language Models 使用语言模型从产品标题生成属性值
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.2
Kalyani Roy, Pawan Goyal, Manish Pandey
Identifying the value of product attribute is essential for many e-commerce functions such as product search and product recommendations. Therefore, identifying attribute values from unstructured product descriptions is a critical undertaking for any e-commerce retailer. What makes this problem challenging is the diversity of product types and their attributes and values. Existing methods have typically employed multiple types of machine learning models, each of which handles specific product types or attribute classes. This has limited their scalability and generalization for large scale real world e-commerce applications. Previous approaches for this task have formulated the attribute value extraction as a Named Entity Recognition (NER) task or a Question Answering (QA) task. In this paper we have presented a generative approach to the attribute value extraction problem using language models. We leverage the large-scale pretraining of the GPT-2 and the T5 text-to-text transformer to create fine-tuned models that can effectively perform this task. We show that a single general model is very effective for this task over a broad set of product attribute values with the open world assumption. Our approach achieves state-of-the-art performance for different attribute classes, which has previously required a diverse set of models.
识别产品属性的价值对于许多电子商务功能(如产品搜索和产品推荐)至关重要。因此,从非结构化的产品描述中识别属性值对于任何电子商务零售商来说都是一项关键任务。使这个问题具有挑战性的是产品类型及其属性和价值的多样性。现有的方法通常采用多种类型的机器学习模型,每种模型都处理特定的产品类型或属性类。这限制了它们在大规模现实世界电子商务应用程序中的可伸缩性和泛化。该任务的先前方法将属性值提取表述为命名实体识别(NER)任务或问答(QA)任务。本文提出了一种基于语言模型的生成方法来解决属性值抽取问题。我们利用GPT-2和T5文本到文本转换器的大规模预训练来创建可以有效执行此任务的微调模型。我们表明,在开放世界假设下,单一的通用模型对于在广泛的产品属性值集上执行此任务非常有效。我们的方法为不同的属性类实现了最先进的性能,这在以前需要一组不同的模型。
{"title":"Attribute Value Generation from Product Title using Language Models","authors":"Kalyani Roy, Pawan Goyal, Manish Pandey","doi":"10.18653/v1/2021.ecnlp-1.2","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.2","url":null,"abstract":"Identifying the value of product attribute is essential for many e-commerce functions such as product search and product recommendations. Therefore, identifying attribute values from unstructured product descriptions is a critical undertaking for any e-commerce retailer. What makes this problem challenging is the diversity of product types and their attributes and values. Existing methods have typically employed multiple types of machine learning models, each of which handles specific product types or attribute classes. This has limited their scalability and generalization for large scale real world e-commerce applications. Previous approaches for this task have formulated the attribute value extraction as a Named Entity Recognition (NER) task or a Question Answering (QA) task. In this paper we have presented a generative approach to the attribute value extraction problem using language models. We leverage the large-scale pretraining of the GPT-2 and the T5 text-to-text transformer to create fine-tuned models that can effectively perform this task. We show that a single general model is very effective for this task over a broad set of product attribute values with the open world assumption. Our approach achieves state-of-the-art performance for different attribute classes, which has previously required a diverse set of models.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133708413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Unsupervised Class-Specific Abstractive Summarization of Customer Reviews 客户评论的非监督类特定抽象摘要
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.11
Thi Thuy Anh Nguyen, Mingwei Shen, K. Hovsepian
Large-scale unsupervised abstractive summarization is sorely needed to automatically scan millions of customer reviews in today’s fast-paced e-commerce landscape. We address a key challenge in unsupervised abstractive summarization – reducing generic and uninformative content and producing useful information that relates to specific product aspects. To do so, we propose to model reviews in the context of some topical classes of interest. In particular, for any arbitrary set of topical classes of interest, the proposed model can learn to generate a set of class-specific summaries from multiple reviews of each product without ground-truth summaries, and the only required signal is class probabilities or class label for each review. The model combines a generative variational autoencoder, with an integrated class-correlation gating mechanism and a hierarchical structure capturing dependence among products, reviews and classes. Human evaluation shows that generated summaries are highly relevant, fluent, and representative. Evaluation using a reference dataset shows that our model outperforms state-of-the-art abstractive and extractive baselines.
在当今快节奏的电子商务环境中,需要大规模的无监督抽象摘要来自动扫描数百万的客户评论。我们解决了无监督抽象摘要中的一个关键挑战-减少通用和无信息的内容,并产生与特定产品方面相关的有用信息。为此,我们建议在一些感兴趣的主题类的背景下对评论进行建模。特别是,对于任何感兴趣的主题类的任意集,所提出的模型可以学习从每个产品的多个评论中生成一组特定于类的摘要,而不需要真值摘要,并且唯一需要的信号是每个评论的类概率或类标签。该模型结合了生成变分自编码器、集成类相关门控机制和捕获产品、评论和类之间依赖关系的分层结构。人工评估表明生成的摘要具有高度的相关性、流畅性和代表性。使用参考数据集的评估表明,我们的模型优于最先进的抽象和提取基线。
{"title":"Unsupervised Class-Specific Abstractive Summarization of Customer Reviews","authors":"Thi Thuy Anh Nguyen, Mingwei Shen, K. Hovsepian","doi":"10.18653/v1/2021.ecnlp-1.11","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.11","url":null,"abstract":"Large-scale unsupervised abstractive summarization is sorely needed to automatically scan millions of customer reviews in today’s fast-paced e-commerce landscape. We address a key challenge in unsupervised abstractive summarization – reducing generic and uninformative content and producing useful information that relates to specific product aspects. To do so, we propose to model reviews in the context of some topical classes of interest. In particular, for any arbitrary set of topical classes of interest, the proposed model can learn to generate a set of class-specific summaries from multiple reviews of each product without ground-truth summaries, and the only required signal is class probabilities or class label for each review. The model combines a generative variational autoencoder, with an integrated class-correlation gating mechanism and a hierarchical structure capturing dependence among products, reviews and classes. Human evaluation shows that generated summaries are highly relevant, fluent, and representative. Evaluation using a reference dataset shows that our model outperforms state-of-the-art abstractive and extractive baselines.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122804421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SupportNet: Neural Networks for Summary Generation and Key Segment Extraction from Technical Support Tickets 从技术支持票中生成摘要和关键段提取的神经网络
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.20
Vinayshekhar Bannihatti Kumar, Mohan Yarramsetty, Sharon Sun, Anukul Goel
We improve customer experience and gain their trust when their issues are resolved rapidly with less friction. Existing work has focused on reducing the overall case resolution time by binning a case into predefined categories and routing it to the desired support engineer. However, the actions taken by the engineer during case analysis and resolution are altogether ignored, even though it forms the bulk of the case resolution time. In this work, we propose two systems that enable support engineers to resolve cases faster. The first, a guidance extraction model, mines historical cases and provides technical guidance phrases to the support engineers. The phrases can then be used to educate the customer or to obtain critical information needed to resolve the case and thus minimize the number of correspondences between the engineer and customer. The second, a summarization model, creates an abstractive summary of the case to provide better context to the support engineer. Through quantitative evaluation we obtain an F1 score of 0.64 on the guidance extraction model and a BertScore (F1) of 0.55 on the summarization model.
我们改善了客户体验,当他们的问题得到快速解决,摩擦减少时,我们赢得了他们的信任。现有的工作重点是通过将案例划分为预定义的类别并将其路由到所需的支持工程师,从而减少总体案例解决时间。然而,工程师在案例分析和解决过程中采取的行动完全被忽略了,尽管它占了案例解决时间的大部分。在这项工作中,我们提出了两个系统,使支持工程师能够更快地解决问题。第一部分是引导抽取模型,挖掘历史案例,为支持工程师提供技术指导短语。然后,这些短语可以用来教育客户或获得解决问题所需的关键信息,从而最大限度地减少工程师和客户之间的通信数量。第二种是总结模型,它创建了案例的抽象总结,为支持工程师提供了更好的上下文。通过定量评价,我们得到制导提取模型的F1分数为0.64,总结模型的BertScore (F1)为0.55。
{"title":"SupportNet: Neural Networks for Summary Generation and Key Segment Extraction from Technical Support Tickets","authors":"Vinayshekhar Bannihatti Kumar, Mohan Yarramsetty, Sharon Sun, Anukul Goel","doi":"10.18653/v1/2021.ecnlp-1.20","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.20","url":null,"abstract":"We improve customer experience and gain their trust when their issues are resolved rapidly with less friction. Existing work has focused on reducing the overall case resolution time by binning a case into predefined categories and routing it to the desired support engineer. However, the actions taken by the engineer during case analysis and resolution are altogether ignored, even though it forms the bulk of the case resolution time. In this work, we propose two systems that enable support engineers to resolve cases faster. The first, a guidance extraction model, mines historical cases and provides technical guidance phrases to the support engineers. The phrases can then be used to educate the customer or to obtain critical information needed to resolve the case and thus minimize the number of correspondences between the engineer and customer. The second, a summarization model, creates an abstractive summary of the case to provide better context to the support engineer. Through quantitative evaluation we obtain an F1 score of 0.64 on the guidance extraction model and a BertScore (F1) of 0.55 on the summarization model.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123217101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detect Profane Language in Streaming Services to Protect Young Audiences 检测流媒体服务中的亵渎语言,以保护年轻观众
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.15
Jingxiang Chen, Kaimin Wei, Xiang Hao
With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content. Detecting profane language in streaming services is challenging due to the long sentences appeared in a video. While recent research on handling long sentences has focused on developing deep learning modeling techniques, little work has focused on techniques on improving data pipelines. In this work, we develop a data collection pipeline to address long sequence of texts and integrate this pipeline with a multi-head self-attention model. With this pipeline, our experiments show the self-attention model offers 12.5% relative accuracy improvement over state-of-the-art distilBERT model on profane language detection while requiring only 3% of parameters. This research designs a better system for informing users of profane language in video streaming services.
随着在线视频流媒体的快速发展,近年来人们越来越关注其内容中的亵渎语言。由于视频中出现了长句子,因此在流媒体服务中检测亵渎语言是一项挑战。虽然最近关于处理长句子的研究主要集中在开发深度学习建模技术上,但很少有工作集中在改进数据管道的技术上。在这项工作中,我们开发了一个数据收集管道来处理长序列的文本,并将该管道与多头自关注模型相结合。有了这个管道,我们的实验表明,自关注模型在亵渎语言检测上比最先进的蒸馏器模型提供了12.5%的相对准确性提高,而只需要3%的参数。本研究设计了一个更好的系统来通知用户在视频流媒体服务中的亵渎语言。
{"title":"Detect Profane Language in Streaming Services to Protect Young Audiences","authors":"Jingxiang Chen, Kaimin Wei, Xiang Hao","doi":"10.18653/v1/2021.ecnlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.15","url":null,"abstract":"With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content. Detecting profane language in streaming services is challenging due to the long sentences appeared in a video. While recent research on handling long sentences has focused on developing deep learning modeling techniques, little work has focused on techniques on improving data pipelines. In this work, we develop a data collection pipeline to address long sequence of texts and integrate this pipeline with a multi-head self-attention model. With this pipeline, our experiments show the self-attention model offers 12.5% relative accuracy improvement over state-of-the-art distilBERT model on profane language detection while requiring only 3% of parameters. This research designs a better system for informing users of profane language in video streaming services.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130772207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring Inspiration Sets in a Data Programming Pipeline for Product Moderation 探索产品适度的数据编程管道中的灵感集
Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.ecnlp-1.16
Justin Winkler, Simon Brugman, Bas van Berkel, M. Larson
We carry out a case study on the use of data programming to create data to train classifiers used for product moderation on a large e-commerce platform. Data programming is a recently-introduced technique that uses human-defined rules to generate training data sets without tedious item-by-item hand labeling. Our study investigates methods for allowing product moderators to quickly modify the rules given their knowledge of the domain and, especially, of textual item descriptions. Our results show promise that moderators can use this approach to steer the training data, making possible fast and close control of classifiers that detect policy violations.
我们进行了一个案例研究,使用数据编程来创建数据来训练用于大型电子商务平台上产品调节的分类器。数据编程是最近引入的一种技术,它使用人类定义的规则来生成训练数据集,而不需要繁琐的逐项手工标记。我们的研究调查了允许产品版主在给定他们的领域知识,特别是文本项目描述的情况下快速修改规则的方法。我们的结果表明,版主可以使用这种方法来引导训练数据,从而可以快速而紧密地控制检测策略违规的分类器。
{"title":"Exploring Inspiration Sets in a Data Programming Pipeline for Product Moderation","authors":"Justin Winkler, Simon Brugman, Bas van Berkel, M. Larson","doi":"10.18653/v1/2021.ecnlp-1.16","DOIUrl":"https://doi.org/10.18653/v1/2021.ecnlp-1.16","url":null,"abstract":"We carry out a case study on the use of data programming to create data to train classifiers used for product moderation on a large e-commerce platform. Data programming is a recently-introduced technique that uses human-defined rules to generate training data sets without tedious item-by-item hand labeling. Our study investigates methods for allowing product moderators to quickly modify the rules given their knowledge of the domain and, especially, of textual item descriptions. Our results show promise that moderators can use this approach to steer the training data, making possible fast and close control of classifiers that detect policy violations.","PeriodicalId":210217,"journal":{"name":"Proceedings of The 4th Workshop on e-Commerce and NLP","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115237477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of The 4th Workshop on e-Commerce and NLP
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1