首页 > 最新文献

Findings (Sydney (N.S.W.)最新文献

英文 中文
Context Generation Improves Open Domain Question Answering 上下文生成改进了开放域问答
Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06349
Dan Su, M. Patwary, Shrimai Prabhumoye, Peng Xu, R. Prenger, M. Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this inefficiency, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract the relevant knowledge and answer a question. We first generate a related context for a given question by prompting a pretrained LM. We then prompt the same LM to generate an answer using the generated context and the question. Additionally, we marginalize over the generated contexts to improve the accuracies and reduce context uncertainty. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods. For example on TriviaQA, our method improves exact match accuracy from 55.3% to 68.6%, and is on par with open-book QA methods (68.6% vs. 68.0%). Our results show that our new methodology is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge.
闭式问答(QA)需要一个模型来直接回答开放领域的问题,而无需访问任何外部知识。先前关于闭书QA的工作要么直接微调,要么提示预先训练的语言模型(LM)来利用存储的知识。然而,它们并没有充分利用参数化知识。为了解决这种低效问题,我们提出了一个两阶段的闭书QA框架,该框架采用从粗到细的方法来提取相关知识并回答问题。我们首先通过提示预先训练的LM来生成给定问题的相关上下文。然后,我们提示同一LM使用生成的上下文和问题生成答案。此外,我们将生成的上下文边缘化,以提高准确性并减少上下文的不确定性。在三个QA基准测试上的实验结果表明,我们的方法显著优于以前的闭书QA方法。例如,在TriviaQA上,我们的方法将精确匹配精度从55.3%提高到68.6%,与开卷QA方法不相上下(68.6%对68.0%)。我们的结果表明,我们的新方法能够更好地利用预训练的LM中存储的知识,而无需添加额外的可学习参数或进行微调,并为将预训练的LMs与外部知识相结合的混合模型铺平了道路。
{"title":"Context Generation Improves Open Domain Question Answering","authors":"Dan Su, M. Patwary, Shrimai Prabhumoye, Peng Xu, R. Prenger, M. Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro","doi":"10.48550/arXiv.2210.06349","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06349","url":null,"abstract":"Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this inefficiency, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract the relevant knowledge and answer a question. We first generate a related context for a given question by prompting a pretrained LM. We then prompt the same LM to generate an answer using the generated context and the question. Additionally, we marginalize over the generated contexts to improve the accuracies and reduce context uncertainty. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods. For example on TriviaQA, our method improves exact match accuracy from 55.3% to 68.6%, and is on par with open-book QA methods (68.6% vs. 68.0%). Our results show that our new methodology is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"781-796"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41763508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Zero-Shot On-the-Fly Event Schema Induction 零样本即时事件模式归纳
Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06254
Rotem Dror, Haoyu Wang, D. Roth
What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant information, and analyzing it. We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them to construct a schema that describes the complex event in its entirety.Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner. Moreover, we develop efficient methods to extract pertinent information from texts and demonstrate in a series of experiments that these schemas are considered to be more complete than human-curated ones in the majority of examined scenarios. Finally, we show that this framework is comparable in performance with previous supervised schema induction methods that rely on collecting real texts and even reaching the best score in the prediction task.
疫情爆发涉及哪些事件?策划婚礼时应该采取哪些步骤?这些问题的答案可以通过收集有关感兴趣的复杂事件的许多文档、提取相关信息并对其进行分析来找到,以及它们之间的关系,以构建一个完整描述复杂事件的模式。使用我们的模型,任何主题的完整模式都可以即时生成,而无需任何手动数据收集,即以零样本方式生成。此外,我们开发了从文本中提取相关信息的有效方法,并在一系列实验中证明,在大多数检查场景中,这些模式被认为比人类策划的模式更完整。最后,我们表明,该框架在性能上与以前的监督模式归纳方法相当,这些方法依赖于收集真实文本,甚至在预测任务中达到最佳分数。
{"title":"Zero-Shot On-the-Fly Event Schema Induction","authors":"Rotem Dror, Haoyu Wang, D. Roth","doi":"10.48550/arXiv.2210.06254","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06254","url":null,"abstract":"What are the events involved in a pandemic outbreak? What steps should be taken when planning a wedding? The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant information, and analyzing it. We present a new approach in which large language models are utilized to generate source documents that allow predicting, given a high-level event definition, the specific events, arguments, and relations between them to construct a schema that describes the complex event in its entirety.Using our model, complete schemas on any topic can be generated on-the-fly without any manual data collection, i.e., in a zero-shot manner. Moreover, we develop efficient methods to extract pertinent information from texts and demonstrate in a series of experiments that these schemas are considered to be more complete than human-curated ones in the majority of examined scenarios. Finally, we show that this framework is comparable in performance with previous supervised schema induction methods that rely on collecting real texts and even reaching the best score in the prediction task.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"693-713"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42441095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation PriMeSRL-Eval:语义角色标注系统评价的实用质量度量
Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06408
Ishan Jindal, Alexandre Rademaker, Khoi-Nguyen Tran, Huaiyu Zhu, H. Kanayama, Marina Danilevsky, Yunyao Li
Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.
语义角色标记(SRL)用于标识句子中的谓词-实参结构。这项任务通常分四个步骤完成:谓词识别、谓词意义消歧、参数识别和参数分类。在一个步骤中引入的错误会传播到后面的步骤。不幸的是,现有的SRL评估脚本并没有考虑到错误传播方面的全部影响。它们要么独立于谓词意义评估参数(CoNLL09),要么根本不评估谓词意义(CoNLL05),从而在参数分类任务上产生不准确的SRL模型性能。在本文中,我们解决了现有评估脚本的关键实际问题,并提出了一个更严格的SRL评估度量PriMeSRL。我们观察到,使用PriMeSRL后,所有SoTA SRL模型的质量评价都显著下降,其相对排名也发生了变化。我们还展示了primesrl成功地惩罚了SoTA SRL模型中的实际故障。
{"title":"PriMeSRL-Eval: A Practical Quality Metric for Semantic Role Labeling Systems Evaluation","authors":"Ishan Jindal, Alexandre Rademaker, Khoi-Nguyen Tran, Huaiyu Zhu, H. Kanayama, Marina Danilevsky, Yunyao Li","doi":"10.48550/arXiv.2210.06408","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06408","url":null,"abstract":"Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1761-1773"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44423067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting 手语语言与口语的机器翻译
Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05404
Zifan Jiang, Amit Moryossef, Mathias Muller, Sarah Ebling
This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup—translating from American Sign Language to (American) English—our method achieves over 30 BLEU, while in two multilingual setups—translating in both directions between spoken languages and signed languages—we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.
本文介绍了一种新的机器翻译系统,在口语和手语之间的机器翻译(MT)系统中,手语被表示为SignWriting,一种手语书写系统。我们的工作旨在解决当前机器翻译系统中缺乏对签名语言的开箱即用支持的问题,并基于SignBank数据集,该数据集包含对口语文本和SignWriting内容。利用神经因子机器翻译的思想,我们引入了新的方法来解析、分解、解码和评估SignWriting。在双语设置中(从美国手语翻译到(美国)英语),我们的方法实现了超过30个BLEU,而在两种多语言设置中(在口语和手语之间进行双向翻译),我们实现了超过20个BLEU。我们发现,用于改善口语翻译的常见机器翻译技术同样会影响手语翻译的表现。这些发现验证了我们在自然语言处理研究中使用手语的中间文本表示。
{"title":"Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting","authors":"Zifan Jiang, Amit Moryossef, Mathias Muller, Sarah Ebling","doi":"10.48550/arXiv.2210.05404","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05404","url":null,"abstract":"This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup—translating from American Sign Language to (American) English—our method achieves over 30 BLEU, while in two multilingual setups—translating in both directions between spoken languages and signed languages—we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1661-1679"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44684579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities ViLPAct:多模式人类活动的合成概括基准
Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05556
Terry Yue Zhuo, Yaqing Liao, Yuecheng Lei, Lizhen Qu, Gerard de Melo, Xiaojun Chang, Yazhou Ren, Zenglin Xu
We introduce {dataset, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from {charades extended with intents via crowdsourcing, a multi-choice question test set, and four strong baselines. One of the baselines implements a neurosymbolic approach based on a multi-modal knowledge base (MKB), while the other ones are deep generative models adapted from recent state-of-the-art (SOTA) methods. According to our extensive experiments, the key challenges are compositional generalization and effective use of information from both modalities.
我们引入了{dataset,这是一种用于人类活动规划的新型视觉语言基准。它是为一个任务而设计的,在这个任务中,嵌入的人工智能代理可以根据关于人类最初活动和意图的视频片段来推理和预测人类未来的行为。该数据集由2.9k个视频组成,这些视频来自于通过众包进行意图扩展的猜字游戏,一个选择题测试集和四个强大的基线。其中一个基线实现了基于多模态知识库(MKB)的神经符号方法,而其他基线是基于最新技术(SOTA)方法的深度生成模型。根据我们广泛的实验,关键的挑战是组合概化和有效利用两种模式的信息。
{"title":"ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities","authors":"Terry Yue Zhuo, Yaqing Liao, Yuecheng Lei, Lizhen Qu, Gerard de Melo, Xiaojun Chang, Yazhou Ren, Zenglin Xu","doi":"10.48550/arXiv.2210.05556","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05556","url":null,"abstract":"We introduce {dataset, a novel vision-language benchmark for human activity planning. It is designed for a task where embodied AI agents can reason and forecast future actions of humans based on video clips about their initial activities and intents in text. The dataset consists of 2.9k videos from {charades extended with intents via crowdsourcing, a multi-choice question test set, and four strong baselines. One of the baselines implements a neurosymbolic approach based on a multi-modal knowledge base (MKB), while the other ones are deep generative models adapted from recent state-of-the-art (SOTA) methods. According to our extensive experiments, the key challenges are compositional generalization and effective use of information from both modalities.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"2147-2162"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42063677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical3D Adapters for Long Video-to-text Summarization 用于长视频到文本摘要的层次结构3D适配器
Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04829
Pinelopi Papalampidi, Mirella Lapata
In this paper, we focus on video-to-text summarization and investigate how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2022), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding full-length videos. We incorporate multimodal information into a pre-trained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8% of model parameters. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
在本文中,我们专注于视频到文本的摘要,并研究如何最好地利用多模式信息将长输入(例如,一个小时长的电视节目)总结为长输出(例如,多句摘要)。我们扩展了SummScreen(Chen et al.,2022),这是一个对话摘要数据集,由电视集的转录本和参考摘要组成,并通过收集相应的全长视频创建了一个多模式变体。我们使用增强了层次结构的适配器模块,将多模式信息有效地结合到预先训练的文本汇总器中,同时仅调整3.8%的模型参数。我们的实验表明,与记忆量更大、调整更精细的文本摘要方法相比,多模式信息提供了优越的性能。
{"title":"Hierarchical3D Adapters for Long Video-to-text Summarization","authors":"Pinelopi Papalampidi, Mirella Lapata","doi":"10.48550/arXiv.2210.04829","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04829","url":null,"abstract":"In this paper, we focus on video-to-text summarization and investigate how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2022), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding full-length videos. We incorporate multimodal information into a pre-trained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8% of model parameters. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1267-1290"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49254426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks 用FIRe对抗FIRe:评估文本到视频检索基准的有效性
Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.05038
Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon
Searching troves of videos with textual descriptions is a core multimodal retrieval task. Owing to the lack of a purpose-built dataset for text-to-video retrieval, video captioning datasets have been re-purposed to evaluate models by (1) treating captions as positive matches to their respective videos and (2) assuming all other videos to be negatives. However, this methodology leads to a fundamental flaw during evaluation: since captions are marked as relevant only to their original video, many alternate videos also match the caption, which introduces false-negative caption-video pairs. We show that when these false negatives are corrected, a recent state-of-the-art model gains 25% recall points—a difference that threatens the validity of the benchmark itself. To diagnose and mitigate this issue, we annotate and release 683K additional caption-video pairs. Using these, we recompute effectiveness scores for three models on two standard benchmarks (MSR-VTT and MSVD). We find that (1) the recomputed metrics are up to 25% recall points higher for the best models, (2) these benchmarks are nearing saturation for Recall@10, (3) caption length (generality) is related to the number of positives, and (4) annotation costs can be mitigated through sampling. We recommend retiring these benchmarks in their current form, and we make recommendations for future text-to-video retrieval benchmarks.
用文本描述搜索大量视频是一项核心的多模式检索任务。由于缺乏专门构建的文本到视频检索数据集,视频字幕数据集已被重新用于评估模型,方法是(1)将字幕视为与其各自视频的正匹配,以及(2)假设所有其他视频都是负匹配。然而,这种方法在评估过程中导致了一个根本缺陷:由于字幕被标记为仅与原始视频相关,许多备选视频也与字幕匹配,这引入了假阴性字幕视频对。我们表明,当这些假阴性得到纠正时,最近最先进的模型获得了25%的召回点——这一差异威胁到基准本身的有效性。为了诊断和缓解这个问题,我们注释并发布了683K额外的字幕视频对。使用这些,我们在两个标准基准(MSR-VTT和MSVD)上重新计算了三个模型的有效性得分。我们发现(1)对于最佳模型,重新计算的指标高出25%的召回点,(2)对于Recall@10,(3)字幕长度(一般性)与阳性的数量有关,以及(4)可以通过采样来降低注释成本。我们建议以目前的形式退役这些基准测试,并为未来的文本到视频检索基准测试提出建议。
{"title":"Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks","authors":"Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon","doi":"10.48550/arXiv.2210.05038","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05038","url":null,"abstract":"Searching troves of videos with textual descriptions is a core multimodal retrieval task. Owing to the lack of a purpose-built dataset for text-to-video retrieval, video captioning datasets have been re-purposed to evaluate models by (1) treating captions as positive matches to their respective videos and (2) assuming all other videos to be negatives. However, this methodology leads to a fundamental flaw during evaluation: since captions are marked as relevant only to their original video, many alternate videos also match the caption, which introduces false-negative caption-video pairs. We show that when these false negatives are corrected, a recent state-of-the-art model gains 25% recall points—a difference that threatens the validity of the benchmark itself. To diagnose and mitigate this issue, we annotate and release 683K additional caption-video pairs. Using these, we recompute effectiveness scores for three models on two standard benchmarks (MSR-VTT and MSVD). We find that (1) the recomputed metrics are up to 25% recall points higher for the best models, (2) these benchmarks are nearing saturation for Recall@10, (3) caption length (generality) is related to the number of positives, and (4) annotation costs can be mitigated through sampling. We recommend retiring these benchmarks in their current form, and we make recommendations for future text-to-video retrieval benchmarks.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"47-68"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46065962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Rules for Aggregating Satisfaction with Activity-travel Episodes to a Day-level Satisfaction Measure 将对活动旅行事件的满意度汇总为一天级别满意度测量的评估规则
Pub Date : 2022-10-03 DOI: 10.32866/001c.38543
Wenbo Guo, T. Schwanen, C. Brand, Y. Chai
The recent interest in developing subjective wellbeing aggregation rules in transport research has triggered dialogue across disciplines. Here we analyze how 10 different aggregation rules result in different day-level indicators of satisfaction based on separate measures for each activity and trip on the day and compare the resulting distribution of day-level scores with those for life satisfaction. We find that the normative rules outperform the heuristic rules and are best used to create day-level indicators of satisfaction with activities and trips if the aim is to mimic the statistical distribution for life satisfaction scores.
最近,人们对在交通研究中开发主观幸福感聚合规则的兴趣引发了跨学科的对话。在这里,我们分析了10种不同的聚合规则如何根据当天每项活动和旅行的单独测量结果产生不同的日级满意度指标,并将由此产生的日级分数分布与生活满意度分布进行比较。我们发现,如果目标是模拟生活满意度得分的统计分布,那么规范规则优于启发式规则,并且最适合用于创建活动和旅行满意度的日级指标。
{"title":"Evaluating Rules for Aggregating Satisfaction with Activity-travel Episodes to a Day-level Satisfaction Measure","authors":"Wenbo Guo, T. Schwanen, C. Brand, Y. Chai","doi":"10.32866/001c.38543","DOIUrl":"https://doi.org/10.32866/001c.38543","url":null,"abstract":"The recent interest in developing subjective wellbeing aggregation rules in transport research has triggered dialogue across disciplines. Here we analyze how 10 different aggregation rules result in different day-level indicators of satisfaction based on separate measures for each activity and trip on the day and compare the resulting distribution of day-level scores with those for life satisfaction. We find that the normative rules outperform the heuristic rules and are best used to create day-level indicators of satisfaction with activities and trips if the aim is to mimic the statistical distribution for life satisfaction scores.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49670100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining Pre- and Post-Pandemic Cross-Border Trips Using Crowdsourced Data at the Second-Busiest US-Mexico Border Community 在第二繁忙的美墨边境社区使用众包数据检查疫情前后的跨境旅行
Pub Date : 2022-09-27 DOI: 10.32866/001c.38429
Erik Vargas, Okan Gurbuz, I. Sener, R. Aldrete
The US-Mexico border witnesses frequent cross-border travels for educational, recreational, healthcare, and work purposes, with millions of passenger and commercial vehicles crossing the international border each year. In 2020, pandemic-related travel restrictions were applied to non-US citizens at the US-Mexico border and reshaped cross-border trips. Using crowdsourced data, we explored the mobility changes that the COVID-19 pandemic brought to the second-busiest border region between the United States and Mexico. Results showed that although some patterns remained similar, overall mobility decreased significantly.
由于教育、娱乐、医疗和工作等原因,美墨边境经常发生跨境旅行,每年有数百万辆客运和商用车穿越国际边境。2020年,与大流行相关的旅行限制适用于美墨边境的非美国公民,并改变了跨境旅行。利用众包数据,我们探索了COVID-19大流行给美国和墨西哥之间第二繁忙的边境地区带来的流动性变化。结果表明,尽管某些模式保持相似,但总体流动性明显下降。
{"title":"Examining Pre- and Post-Pandemic Cross-Border Trips Using Crowdsourced Data at the Second-Busiest US-Mexico Border Community","authors":"Erik Vargas, Okan Gurbuz, I. Sener, R. Aldrete","doi":"10.32866/001c.38429","DOIUrl":"https://doi.org/10.32866/001c.38429","url":null,"abstract":"The US-Mexico border witnesses frequent cross-border travels for educational, recreational, healthcare, and work purposes, with millions of passenger and commercial vehicles crossing the international border each year. In 2020, pandemic-related travel restrictions were applied to non-US citizens at the US-Mexico border and reshaped cross-border trips. Using crowdsourced data, we explored the mobility changes that the COVID-19 pandemic brought to the second-busiest border region between the United States and Mexico. Results showed that although some patterns remained similar, overall mobility decreased significantly.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47688165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interrupted Time Series Analysis of the Sociodemographics of Crash Victims during the Illinois Stay at Home Order 伊利诺斯州住家令期间车祸受害者的社会人口学中断时间序列分析
Pub Date : 2022-09-23 DOI: 10.32866/001c.38490
Mickey Edwards
The race/ethnicity and gender of motor vehicle crash victims during the 2020 Illinois stay at home order are compared to previous years. The median poverty rate of crash victims are compared across the five years of 2016-20, finding that poverty is strongly associated with Black male and female crash victims. Several contributing crash factors like speed, distracted driving, seat belt use, and intoxication are also compared. Within race/ethnicity females significantly decreased their proportion of crash involvement while males significantly increased theirs. An interrupted time series analysis and a segmented binary logistic regression are used in conjunction with a presentation of summary statistics.
将2020年伊利诺伊州居家令期间车祸受害者的种族/族裔和性别与往年进行比较。对2016-20年五年的车祸受害者的平均贫困率进行了比较,发现贫困与黑人男性和女性车祸受害者密切相关。还比较了速度、分心驾驶、安全带使用和醉酒等几个促成车祸的因素。在种族/民族中,女性参与车祸的比例显著降低,而男性则显著增加。中断时间序列分析和分段二元逻辑回归与汇总统计的展示结合使用。
{"title":"An Interrupted Time Series Analysis of the Sociodemographics of Crash Victims during the Illinois Stay at Home Order","authors":"Mickey Edwards","doi":"10.32866/001c.38490","DOIUrl":"https://doi.org/10.32866/001c.38490","url":null,"abstract":"The race/ethnicity and gender of motor vehicle crash victims during the 2020 Illinois stay at home order are compared to previous years. The median poverty rate of crash victims are compared across the five years of 2016-20, finding that poverty is strongly associated with Black male and female crash victims. Several contributing crash factors like speed, distracted driving, seat belt use, and intoxication are also compared. Within race/ethnicity females significantly decreased their proportion of crash involvement while males significantly increased theirs. An interrupted time series analysis and a segmented binary logistic regression are used in conjunction with a presentation of summary statistics.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42665268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Findings (Sydney (N.S.W.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1