Transactions of the Association for Computational Linguistics最新文献

英文中文

FeelingBlue: A Corpus for Understanding the Emotional Connotation of Color in Context 感受蓝色:理解语境中色彩情感内涵的语料库

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-03-01 DOI: 10.1162/tacl_a_00540

Amith Ananthram, Olivia Winn, S. Muresan

While the link between color and emotion has been widely studied, how context-based changes in color impact the intensity of perceived emotions is not well understood. In this work, we present a new multimodal dataset for exploring the emotional connotation of color as mediated by line, stroke, texture, shape, and language. Our dataset, FeelingBlue, is a collection of 19,788 4-tuples of abstract art ranked by annotators according to their evoked emotions and paired with rationales for those annotations. Using this corpus, we present a baseline for a new task: Justified Affect Transformation. Given an image I, the task is to 1) recolor I to enhance a specified emotion e and 2) provide a textual justification for the change in e. Our model is an ensemble of deep neural networks which takes I, generates an emotionally transformed color palette p conditioned on I, applies p to I, and then justifies the color transformation in text via a visual-linguistic model. Experimental results shed light on the emotional connotation of color in context, demonstrating both the promise of our approach on this challenging task and the considerable potential for future investigations enabled by our corpus.1

虽然颜色和情绪之间的联系已经被广泛研究，但基于上下文的颜色变化如何影响感知情绪的强度还没有得到很好的理解。在这项工作中，我们提出了一个新的多模式数据集，用于探索由线条、笔划、纹理、形状和语言介导的颜色的情感内涵。我们的数据集FeelingBlue收集了19788个4元组的抽象艺术，由注释者根据他们唤起的情绪进行排序，并与这些注释的理由配对。使用这个语料库，我们提出了一个新任务的基线：合理的情感转换。给定图像I，任务是1）对I重新着色以增强特定的情绪e，2）为e的变化提供文本证明。我们的模型是一个深度神经网络的集合，它获取I，生成以I为条件的情绪转换调色板p，将p应用于I，然后通过视觉语言模型证明文本中的颜色转换。实验结果揭示了背景下颜色的情感内涵，证明了我们的方法在这项具有挑战性的任务上的前景，以及我们的语料库在未来研究中的巨大潜力。1

{"title":"FeelingBlue: A Corpus for Understanding the Emotional Connotation of Color in Context","authors":"Amith Ananthram, Olivia Winn, S. Muresan","doi":"10.1162/tacl_a_00540","DOIUrl":"https://doi.org/10.1162/tacl_a_00540","url":null,"abstract":"While the link between color and emotion has been widely studied, how context-based changes in color impact the intensity of perceived emotions is not well understood. In this work, we present a new multimodal dataset for exploring the emotional connotation of color as mediated by line, stroke, texture, shape, and language. Our dataset, FeelingBlue, is a collection of 19,788 4-tuples of abstract art ranked by annotators according to their evoked emotions and paired with rationales for those annotations. Using this corpus, we present a baseline for a new task: Justified Affect Transformation. Given an image I, the task is to 1) recolor I to enhance a specified emotion e and 2) provide a textual justification for the change in e. Our model is an ensemble of deep neural networks which takes I, generates an emotionally transformed color palette p conditioned on I, applies p to I, and then justifies the color transformation in text via a visual-linguistic model. Experimental results shed light on the emotional connotation of color in context, demonstrating both the promise of our approach on this challenging task and the considerable potential for future investigations enabled by our corpus.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"176-190"},"PeriodicalIF":10.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44321137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transformers for Tabular Data Representation: A Survey of Models and Applications 表格数据表示的变换器：模型和应用综述

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-03-01 DOI: 10.1162/tacl_a_00544

Gilbert Badaro, Mohammed Saeed, Paolo Papotti

In the last few years, the natural language processing community has witnessed advances in neural representations of free texts with transformer-based language models (LMs). Given the importance of knowledge available in tabular data, recent research efforts extend LMs by developing neural representations for structured data. In this article, we present a survey that analyzes these efforts. We first abstract the different systems according to a traditional machine learning pipeline in terms of training data, input representation, model training, and supported downstream tasks. For each aspect, we characterize and compare the proposed solutions. Finally, we discuss future work directions.

在过去的几年里，自然语言处理界见证了使用基于转换器的语言模型（LMs）对自由文本进行神经表示的进步。鉴于表格数据中可用知识的重要性，最近的研究工作通过开发结构化数据的神经表示来扩展LMs。在这篇文章中，我们提出了一个调查来分析这些努力。我们首先根据传统的机器学习管道，在训练数据、输入表示、模型训练和支持的下游任务方面对不同的系统进行抽象。对于每个方面，我们都对所提出的解决方案进行了表征和比较。最后，我们讨论了未来的工作方向。

引用次数: 13

Hate Speech Classifiers Learn Normative Social Stereotypes 仇恨言语分类器学习规范的社会刻板印象

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-03-01 DOI: 10.1162/tacl_a_00550

A. Davani, M. Atari, Brendan Kennedy, Morteza Dehghani

Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.

社会刻板印象会对个人对不同群体的判断产生负面影响，并可能在理解针对边缘化群体的语言方面发挥关键作用。在这里，我们通过研究社会刻板印象对注释行为、注释数据集和仇恨言论分类器的影响，评估了社会刻板印象在英语仇恨言论自动检测中的作用。具体来说，我们首先调查了新手注释者的刻板印象对他们仇恨言论注释行为的影响。然后，我们考察了语言中的规范刻板印象对大型注释语料库中聚合注释者判断的影响。最后，我们展示了嵌入语言资源中的规范刻板印象如何与仇恨言论分类器中的系统预测错误相关联。研究结果表明，仇恨言论分类器反映了社会对边缘化群体的刻板印象，这种刻板印象在大规模传播时会使社会不平等现象长期存在。该框架结合了社会心理学和计算语言学方法，深入了解了仇恨言论节制中的偏见来源，为正在进行的关于机器学习公平性的辩论提供了信息。

{"title":"Hate Speech Classifiers Learn Normative Social Stereotypes","authors":"A. Davani, M. Atari, Brendan Kennedy, Morteza Dehghani","doi":"10.1162/tacl_a_00550","DOIUrl":"https://doi.org/10.1162/tacl_a_00550","url":null,"abstract":"Social stereotypes negatively impact individuals’ judgments about different groups and may have a critical role in understanding language directed toward marginalized groups. Here, we assess the role of social stereotypes in the automated detection of hate speech in the English language by examining the impact of social stereotypes on annotation behaviors, annotated datasets, and hate speech classifiers. Specifically, we first investigate the impact of novice annotators’ stereotypes on their hate-speech-annotation behavior. Then, we examine the effect of normative stereotypes in language on the aggregated annotators’ judgments in a large annotated corpus. Finally, we demonstrate how normative stereotypes embedded in language resources are associated with systematic prediction errors in a hate-speech classifier. The results demonstrate that hate-speech classifiers reflect social stereotypes against marginalized groups, which can perpetuate social inequalities when propagated at scale. This framework, combining social-psychological and computational-linguistic methods, provides insights into sources of bias in hate-speech moderation, informing ongoing debates regarding machine learning fairness.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"300-319"},"PeriodicalIF":10.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46301957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Discontinuous Combinatory Constituency Parsing 不连续组合选区解析

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-03-01 DOI: 10.1162/tacl_a_00546

Zhousi Chen, Mamoru Komachi

We extend a pair of continuous combinator-based constituency parsers (one binary and one multi-branching) into a discontinuous pair. Our parsers iteratively compose constituent vectors from word embeddings without any grammar constraints. Their empirical complexities are subquadratic. Our extension includes 1) a swap action for the orientation-based binary model and 2) biaffine attention for the chunker-based multi-branching model. In tests conducted with the Discontinuous Penn Treebank and TIGER Treebank, we achieved state-of-the-art discontinuous accuracy with a significant speed advantage.

我们将一对基于连续组合子的选区解析器（一个二进制和一个多分支）扩展为一对不连续的。我们的解析器在没有任何语法约束的情况下，从单词嵌入中迭代地组成组成向量。它们的经验复杂性是次二次的。我们的扩展包括1）基于方向的二进制模型的交换动作，以及2）基于分块器的多分支模型的双仿射注意。在使用不连续宾夕法尼亚树库和老虎树库进行的测试中，我们实现了最先进的不连续精度，并具有显著的速度优势。

引用次数: 0

On Graph-based Reentrancy-free Semantic Parsing 基于图的无重入语义分析

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-02-15 DOI: 10.1162/tacl_a_00570

Alban Petit, Caio Corro

We propose a novel graph-based approach for semantic parsing that resolves two problems observed in the literature: (1) seq2seq models fail on compositional generalization tasks; (2) previous work using phrase structure parsers cannot cover all the semantic parses observed in treebanks. We prove that both MAP inference and latent tag anchoring (required for weakly-supervised learning) are NP-hard problems. We propose two optimization algorithms based on constraint smoothing and conditional gradient to approximately solve these inference problems. Experimentally, our approach delivers state-of-the-art results on GeoQuery, Scan, and Clevr, both for i.i.d. splits and for splits that test for compositional generalization.

我们提出了一种新的基于图的语义分析方法，解决了文献中观察到的两个问题:(1)seq2seq模型在组合泛化任务上失败;(2)以前使用短语结构解析器的工作不能涵盖在树库中观察到的所有语义解析。我们证明了MAP推理和潜标签锚定(弱监督学习所需的)都是np困难问题。我们提出了两种基于约束平滑和条件梯度的优化算法来近似解决这些推理问题。实验上，我们的方法在GeoQuery、Scan和Clevr上提供了最先进的结果，既适用于i.i.d.分割，也适用于测试成分泛化的分割。

引用次数: 2

Erasure of Unaligned Attributes from Neural Representations 从神经表示中删除未对齐的属性

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-02-06 DOI: 10.1162/tacl_a_00558

Shun Shao, Yftah Ziser, Shay B. Cohen

We present the Assignment-Maximization Spectral Attribute removaL (AMSAL) algorithm, which erases information from neural representations when the information to be erased is implicit rather than directly being aligned to each input example. Our algorithm works by alternating between two steps. In one, it finds an assignment of the input representations to the information to be erased, and in the other, it creates projections of both the input representations and the information to be erased into a joint latent space. We test our algorithm on an extensive array of datasets, including a Twitter dataset with multiple guarded attributes, the BiasBios dataset, and the BiasBench benchmark. The latter benchmark includes four datasets with various types of protected attributes. Our results demonstrate that bias can often be removed in our setup. We also discuss the limitations of our approach when there is a strong entanglement between the main task and the information to be erased.1

我们提出了分配最大化谱属性远程（AMSAL）算法，该算法在要擦除的信息是隐式的而不是直接与每个输入示例对齐时，从神经表示中擦除信息。我们的算法通过在两个步骤之间交替来工作。在一种情况下，它找到输入表示对要擦除的信息的分配，而在另一种情况中，它创建输入表示和要擦除的消息两者到联合潜在空间的投影。我们在大量数据集上测试了我们的算法，包括具有多个保护属性的Twitter数据集、BiasBios数据集和BiasBench基准测试。后一个基准包括四个具有各种类型的受保护属性的数据集。我们的结果表明，在我们的设置中，通常可以消除偏差。我们还讨论了当主要任务和要擦除的信息之间存在强烈纠缠时，我们的方法的局限性。1

引用次数: 5

Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing 释放序列到序列模型的真正潜力，用于序列标记和结构解析

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-02-05 DOI: 10.1162/tacl_a_00557

Han He, Jinho D. Choi

Sequence-to-Sequence (S2S) models have achieved remarkable success on various text generation tasks. However, learning complex structures with S2S models remains challenging as external neural modules and additional lexicons are often supplemented to predict non-textual outputs. We present a systematic study of S2S modeling using contained decoding on four core tasks: part-of-speech tagging, named entity recognition, constituency, and dependency parsing, to develop efficient exploitation methods costing zero extra parameters. In particular, 3 lexically diverse linearization schemas and corresponding constrained decoding methods are designed and evaluated. Experiments show that although more lexicalized schemas yield longer output sequences that require heavier training, their sequences being closer to natural language makes them easier to learn. Moreover, S2S models using our constrained decoding outperform other S2S approaches using external resources. Our best models perform better than or comparably to the state-of-the-art for all 4 tasks, lighting a promise for S2S models to generate non-sequential structures.

序列到序列(S2S)模型在各种文本生成任务中取得了显著的成功。然而，使用S2S模型学习复杂结构仍然具有挑战性，因为通常需要补充外部神经模块和额外的词汇来预测非文本输出。我们对S2S建模进行了系统的研究，使用包含解码的四个核心任务:词性标记、命名实体识别、选区和依赖关系解析，以开发不需要额外参数的高效开发方法。特别地，设计并评估了3种词法不同的线性化模式和相应的约束解码方法。实验表明，虽然词汇化程度越高的模式产生的输出序列越长，需要更多的训练，但它们的序列更接近自然语言，因此更容易学习。此外，使用我们的约束解码的S2S模型优于使用外部资源的其他S2S方法。我们最好的模型在所有4个任务上的表现都优于或与最先进的模型相当，这为S2S模型生成非顺序结构带来了希望。

{"title":"Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing","authors":"Han He, Jinho D. Choi","doi":"10.1162/tacl_a_00557","DOIUrl":"https://doi.org/10.1162/tacl_a_00557","url":null,"abstract":"Sequence-to-Sequence (S2S) models have achieved remarkable success on various text generation tasks. However, learning complex structures with S2S models remains challenging as external neural modules and additional lexicons are often supplemented to predict non-textual outputs. We present a systematic study of S2S modeling using contained decoding on four core tasks: part-of-speech tagging, named entity recognition, constituency, and dependency parsing, to develop efficient exploitation methods costing zero extra parameters. In particular, 3 lexically diverse linearization schemas and corresponding constrained decoding methods are designed and evaluated. Experiments show that although more lexicalized schemas yield longer output sequences that require heavier training, their sequences being closer to natural language makes them easier to learn. Moreover, S2S models using our constrained decoding outperform other S2S approaches using external resources. Our best models perform better than or comparably to the state-of-the-art for all 4 tasks, lighting a promise for S2S models to generate non-sequential structures.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"582-599"},"PeriodicalIF":10.9,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42828943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Emotion Dynamics in Song Lyrics with State Space Models 用状态空间模型建模歌词中的情感动态

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-02-01 DOI: 10.1162/tacl_a_00541

Yingjin Song, Daniel Beck

Most previous work in music emotion recognition assumes a single or a few song-level labels for the whole song. While it is known that different emotions can vary in intensity within a song, annotated data for this setup is scarce and difficult to obtain. In this work, we propose a method to predict emotion dynamics in song lyrics without song-level supervision. We frame each song as a time series and employ a State Space Model (SSM), combining a sentence-level emotion predictor with an Expectation-Maximization (EM) procedure to generate the full emotion dynamics. Our experiments show that applying our method consistently improves the performance of sentence-level baselines without requiring any annotated songs, making it ideal for limited training data scenarios. Further analysis through case studies shows the benefits of our method while also indicating the limitations and pointing to future directions.

大多数先前的音乐情感识别工作都假设整首歌有一个或几个歌曲级别的标签。虽然我们知道一首歌中不同情绪的强度会有所不同，但这种设置的注释数据很少，而且很难获得。在这项工作中，我们提出了一种在没有歌曲级别监督的情况下预测歌词情感动态的方法。我们将每首歌曲构建为一个时间序列，并采用状态空间模型(SSM)，将句子级情绪预测器与期望最大化(EM)程序相结合，以生成完整的情绪动态。我们的实验表明，应用我们的方法可以持续提高句子级基线的性能，而不需要任何注释歌曲，使其成为有限训练数据场景的理想选择。通过案例研究的进一步分析显示了我们方法的优点，同时也指出了局限性并指出了未来的方向。

引用次数: 0

Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off 交流驱动神经主体语言共性的出现:来自语序/区分大小写权衡的证据

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-01-30 DOI: 10.1162/tacl_a_00587

Yuchen Lian, Arianna Bisazza, T. Verhoef

Abstract Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.

在基于神经智能体的语言出现和变化模拟中，人工学习者的行为往往与人类学习者不同。一个常见的解释是这些学习者缺乏适当的认知偏见。然而，也有人提出，更自然的语言学习和使用环境可能会导致更像人类的结果。我们研究了后一种说法，重点是词序/大小写标记的权衡，这是一种广泛证明的语言通用，已被证明特别难以模拟。我们提出了一个新的神经智能体语言学习和交流框架(NeLLCom)，其中说和听的智能体对首先通过监督学习学习微型语言，然后通过强化学习优化其用于交流。紧跟早期人类实验的设置，我们成功地用新框架复制了这种权衡，而没有在代理中硬编码特定的偏见。我们认为这是用神经学习者研究语言共相的重要一步。

{"title":"Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off","authors":"Yuchen Lian, Arianna Bisazza, T. Verhoef","doi":"10.1162/tacl_a_00587","DOIUrl":"https://doi.org/10.1162/tacl_a_00587","url":null,"abstract":"Abstract Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1033-1047"},"PeriodicalIF":10.9,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43044930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences 视觉写作提示:以人物为基础的故事生成与策划图像序列

IF 10.9 1区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Transactions of the Association for Computational Linguistics

Pub Date : 2023-01-20 DOI: 10.1162/tacl_a_00553

Xudong Hong, A. Sayeed, K. Mehra, Vera Demberg, B. Schiele

Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence collection and filtering process has allowed us to obtain stories that are more coherent, diverse, and visually grounded compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and diverse than stories generated with the current state-of-the-art model. Our code, image features, annotations and collected stories are available at https://vwprompt.github.io/.

目前基于图像的故事生成工作的缺点是现有的图像序列集合背后没有连贯的情节。我们通过生成一个新的基于图像的数据集，视觉写作提示(VWP)来改进视觉故事生成。VWP包含近2K选定的电影镜头序列，每个序列包括5-10张图像。图像序列与通过众包收集的12K个故事相一致，这些故事提供了图像序列和相应图像序列中的一组基础人物。与以前的工作相比，我们新的图像序列收集和过滤过程使我们能够获得更加连贯，多样化和视觉接地的故事。我们还提出了一个基于角色的故事生成模型，该模型由连贯性驱动，作为一个强大的基线。评估表明，我们生成的故事比使用当前最先进的模型生成的故事更连贯、更有视觉基础、更多样化。我们的代码，图像功能，注释和收集的故事可以在https://vwprompt.github.io/上获得。

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Transactions of the Association for Computational Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀