首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Speech self-supervised representations benchmarking: A case for larger probing heads 语音自监督表征基准:更大探测头的案例
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-03 DOI: 10.1016/j.csl.2024.101695

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.

自监督学习(SSL)利用大量无标注语音数据集,在减少标注数据量的情况下实现令人印象深刻的性能。由于提出的方法数量众多,因此出现了一些综合基准,用于评估这些方法在一系列探索语音信号各个方面的下游任务上的性能。然而,虽然考虑的任务数量不断增加,但大多数建议都依赖于将冻结的 SSL 表示映射到任务标签的单一下游架构。本研究探讨了基准测试结果如何受到探测头架构变化的影响。有趣的是,我们发现改变下游架构结构会导致所评估模型的性能排名出现显著波动。针对语音 SSL 基准测试中的常见做法,我们评估了更大容量的探测头,显示了它们对性能、推理成本、泛化和多层次特征利用的影响。
{"title":"Speech self-supervised representations benchmarking: A case for larger probing heads","authors":"","doi":"10.1016/j.csl.2024.101695","DOIUrl":"10.1016/j.csl.2024.101695","url":null,"abstract":"<div><p>Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization, and multi-level feature exploitation.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000780/pdfft?md5=2b21a1caf20c9b6cfe8c476d74149c9f&pid=1-s2.0-S0885230824000780-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved relation extraction through key phrase identification using community detection on dependency trees 利用依存树上的社群检测,通过关键短语识别改进关系提取
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-02 DOI: 10.1016/j.csl.2024.101706

A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.

本文介绍了一种通过利用句子的依存树来识别关键短语,从而从句子中提取关系的方法。在自然语言处理中,依赖树通常用来表示句子的语法结构,而这种方法就是在这种表示法的基础上提取短语之间有意义的关系。识别关键短语在关系提取中至关重要,因为它们通常表示关系中涉及的实体和行为。该方法使用依赖树上的群体检测算法来识别构成关键短语的相关词组,如主谓宾结构。在 Semeval-2010 task8 数据集和 TACRED 数据集上的实验表明,所提出的方法优于现有的基线方法。
{"title":"Improved relation extraction through key phrase identification using community detection on dependency trees","authors":"","doi":"10.1016/j.csl.2024.101706","DOIUrl":"10.1016/j.csl.2024.101706","url":null,"abstract":"<div><p>A method for extracting relations from sentences by utilizing their dependency trees to identify key phrases is presented in this paper. Dependency trees are commonly used in natural language processing to represent the grammatical structure of a sentence, and this approach builds upon this representation to extract meaningful relations between phrases. Identifying key phrases is crucial in relation extraction as they often indicate the entities and actions involved in a relation. The method uses community detection algorithms on the dependency tree to identify groups of related words that form key phrases, such as subject-verb-object structures. The experiments on the Semeval-2010 task8 dataset and the TACRED dataset demonstrate that the proposed method outperforms existing baseline methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000895/pdfft?md5=b0ec7e5572384747887044b09fab856d&pid=1-s2.0-S0885230824000895-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data 评估语言模型在对话数据情感分析中的任务和语言转换能力
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101704

Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.

我们的研究探索了基于 GRU 的方法和基于转换器的方法在文本对话情感分析中的差异。除了下游任务的整体性能外,我们还通过在任务层面上应用全面的零点分析,以及五种欧洲语言之间的跨语言性能,评估了模型的知识转移能力。由于特定应用所需的数据可能稀缺或不存在,因此对不同任务和语言进行泛化的能力非常重要。我们在已知的基准数据集和包含罗马尼亚语呼叫中心对话的新型合成对话数据集上进行了评估。我们研究了合成数据和真实数据的最合适组合,以便对下游任务进行微调,使我们的模型能够在资源匮乏的环境中运行。我们充分利用了对话上下文的信息力量,结果表明,在输入序列中附加同一说话者的前四句话对推理性能有最大的好处。跨语言和跨任务评估表明,基于转换器的模型比 GRU 模型具有更强的转换能力,尤其是在零镜头环境下。考虑到 FLAN-T5 之前针对不同任务在多个标注数据集上进行的密集微调,它在零点任务实验中表现出色,在 IEMOCAP 数据集上获得了 51.27% 的零点准确率,与经典 BERT 在 MELD 数据集上获得的 55.08% 的最高零点准确率并驾齐驱。
{"title":"Assessing language models’ task and language transfer capabilities for sentiment analysis in dialog data","authors":"","doi":"10.1016/j.csl.2024.101704","DOIUrl":"10.1016/j.csl.2024.101704","url":null,"abstract":"<div><p>Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000871/pdfft?md5=a2ab3e37131135c69cec0ed9bbef500a&pid=1-s2.0-S0885230824000871-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COfEE: A comprehensive ontology for event extraction from text COfEE:从文本中提取事件的综合本体论
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-31 DOI: 10.1016/j.csl.2024.101702

Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.

大量数据不断在网络上发布;然而,这些数据大多是非结构化的,因此难以理解和解释。为了从这些数据中提取有意义的结构化信息,研究人员和从业人员转向了信息提取(IE)方法。最具挑战性的信息提取任务之一是事件提取(EE),它涉及从文本中提取与特定事件及其相关人员有关的信息。EE 应用广泛,包括建立知识库、信息检索、总结和在线监控系统。过去几十年来,人们开发了各种事件本体,如 ACE、CAMEO 和 ICEWS,用于定义文本中观察到的事件的形式、参与者和维度。然而,这些本体论也有一些局限性,如仅涵盖政治事件等少数主题、在定义论点角色时结构不够灵活、缺乏分析维度以及黄金标准数据不足等。为了解决这些问题,我们提出了一个新的事件本体--COfEE,它整合了专家领域知识、以前的本体和数据驱动方法,用于从文本中识别事件。COfEE 包括两个层次(事件类型和事件子类型),其中包括与需要实时监控的环境问题、网络空间、犯罪活动和自然灾害相关的新类别。此外,还为每个事件子类型定义了动态角色,以捕捉事件的各个层面。在维基百科事件上对所提出的本体进行了评估,结果表明本体是全面和通用的。此外,为了便于准备用于事件提取的黄金标准数据,我们还提出了一种基于 COfEE 的、与语言无关的在线工具。我们还准备了一个黄金标准数据集,该数据集由十位人类专家根据 COfEE 本体注释的 24,000 篇波斯语新闻文章组成。为了使数据多样化,我们还从维基百科事件门户网站和 2008 年至 2021 年间最受欢迎的 100 家波斯语通讯社中收集了新闻文章。最后,我们引入了一种基于深度学习技术的监督方法,以自动提取相关事件及其相应的参与者。
{"title":"COfEE: A comprehensive ontology for event extraction from text","authors":"","doi":"10.1016/j.csl.2024.101702","DOIUrl":"10.1016/j.csl.2024.101702","url":null,"abstract":"<div><p>Large volumes of data are constantly being published on the web; however, the majority of this data is often unstructured, making it difficult to comprehend and interpret. To extract meaningful and structured information from such data, researchers and practitioners have turned to Information Extraction (IE) methods. One of the most challenging IE tasks is Event Extraction (EE), which involves extracting information related to specific incidents and their associated actors from text. EE has broad applications, including building a knowledge base, information retrieval, summarization, and online monitoring systems. Over the past few decades, various event ontologies, such as ACE, CAMEO, and ICEWS, have been developed to define event forms, actors, and dimensions of events observed in text. However, these ontologies have some limitations, such as covering only a few topics like political events, having inflexible structures in defining argument roles, lacking analytical dimensions, and insufficient gold-standard data. To address these concerns, we propose a new event ontology, COfEE, which integrates expert domain knowledge, previous ontologies, and a data-driven approach for identifying events from text. COfEE comprises two hierarchy levels (event types and event sub-types) that include new categories related to environmental issues, cyberspace, criminal activity, and natural disasters that require real-time monitoring. In addition, dynamic roles are defined for each event sub-type to capture various dimensions of events. The proposed ontology is evaluated on Wikipedia events, and it is shown to be comprehensive and general. Furthermore, to facilitate the preparation of gold-standard data for event extraction, we present a language-independent online tool based on COfEE. A gold-standard dataset annotated by ten human experts consisting of 24,000 news articles in Persian according to the COfEE ontology is also prepared. To diversify the data, news articles from the Wikipedia event portal and the 100 most popular Persian news agencies between 2008 and 2021 were collected. Finally, we introduce a supervised method based on deep learning techniques to automatically extract relevant events and their corresponding actors.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000858/pdfft?md5=edd34515a4d99328a0c8d35808aa0fe2&pid=1-s2.0-S0885230824000858-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conversations in the wild: Data collection, automatic generation and evaluation 野外对话数据收集、自动生成和评估
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-30 DOI: 10.1016/j.csl.2024.101699

The aim of conversational speech processing is to analyze human conversations in natural settings. It finds numerous applications in personality traits identification, speech therapy, speaker identification and verification, speech emotion detection, and speaker diarization. However, large-scale annotated datasets required for feature extraction and conversational model training only exist for a handful of languages (e.g. English, Mandarin, and French) as the gathering, cleaning, and annotation of such datasets is tedious, time-consuming, and expensive. We propose two scalable, language-agnostic algorithms for automatically generating multi-speaker, variable-length, spontaneous conversations. These algorithms synthesize conversations using existing non-conversational speech datasets. We also contribute the resulting datasets (283 hours, 50 speakers). As a comparison, we also gathered the first spontaneous conversational dataset for Urdu (24 hours, 212 speakers) from public talk shows. Using speaker diarization as an example, we evaluate our datasets and report the first baseline diarization error rates (DER) for Urdu (25% for synthetic dataset-based models, and 29% for natural conversations). Our conversational speech generation technique allows training speaker diarization pipelines without the need for preparing huge conversational repositories.

会话语音处理的目的是分析自然环境中的人类会话。会话语音处理在个性特征识别、语音治疗、说话人识别与验证、语音情感检测和说话人日记等方面应用广泛。然而,特征提取和会话模型训练所需的大规模注释数据集仅适用于少数语言(如英语、普通话和法语),因为此类数据集的收集、清理和注释工作繁琐、耗时且昂贵。我们提出了两种可扩展的、与语言无关的算法,用于自动生成多发言人、长度可变的自发对话。这些算法利用现有的非会话语音数据集合成对话。我们还提供了生成的数据集(283 小时,50 位发言人)。作为对比,我们还收集了首个乌尔都语自发会话数据集(24 小时,212 位发言人),这些数据来自公共脱口秀节目。以说话人日记化为例,我们对数据集进行了评估,并报告了乌尔都语的首个基准日记化错误率(DER)(基于合成数据集的模型为 25%,自然会话为 29%)。我们的会话语音生成技术可以训练说话人日记化管道,而无需准备庞大的会话库。
{"title":"Conversations in the wild: Data collection, automatic generation and evaluation","authors":"","doi":"10.1016/j.csl.2024.101699","DOIUrl":"10.1016/j.csl.2024.101699","url":null,"abstract":"<div><p>The aim of conversational speech processing is to analyze human conversations in natural settings. It finds numerous applications in personality traits identification, speech therapy, speaker identification and verification, speech emotion detection, and speaker diarization. However, large-scale annotated datasets required for feature extraction and conversational model training only exist for a handful of languages (e.g. English, Mandarin, and French) as the gathering, cleaning, and annotation of such datasets is tedious, time-consuming, and expensive. We propose two scalable, language-agnostic algorithms for automatically generating multi-speaker, variable-length, spontaneous conversations. These algorithms synthesize conversations using existing non-conversational speech datasets. We also contribute the resulting datasets (283 hours, 50 speakers). As a comparison, we also gathered the first spontaneous conversational dataset for Urdu (24 hours, 212 speakers) from public talk shows. Using speaker diarization as an example, we evaluate our datasets and report the first baseline diarization error rates (DER) for Urdu (25% for synthetic dataset-based models, and 29% for natural conversations). Our conversational speech generation technique allows training speaker diarization pipelines without the need for preparing huge conversational repositories.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000822/pdfft?md5=3c965afd5ed1a80b86a1318a77699ef7&pid=1-s2.0-S0885230824000822-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompting large language models for user simulation in task-oriented dialogue systems 提示大型语言模型,用于面向任务的对话系统中的用户模拟
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1016/j.csl.2024.101697

Large Language Models (LLMs) have gained widespread popularity due to their instruction-following abilities. In this study, we evaluate their ability in simulating user interactions for task-oriented dialogue (TOD) systems. Our findings demonstrate that prompting LLMs reveals their promising capabilities for training and testing dialogue policies, eliminating the need for domain expertise in crafting complex rules or relying on large annotated data, as required by traditional simulators. The results show that the dialogue system trained with the ChatGPT simulator achieves a success rate of 59%, comparable to a 62% success rate of the dialogue system trained with the manual-rules, agenda-based user simulator (ABUS). Furthermore, the dialogue system trained with the ChatGPT simulator demonstrates better generalization ability compared to the dialogue system trained with the ABUS. Its success rate outperforms that of the dialogue system trained with the ABUS by 4% on GenTUS, 5% on the ChatGPT Simulator, and 3% on the Llama simulator. Nevertheless, LLM-based user simulators provide challenging environment, lexically rich, diverse, and random responses. Llama simulator outperforms the human reference in all lexical diversity metrics with a margin of 0.66 in SE, 0.39 in CE, 0.01 in MSTTR, 0.04 in HDD, and 0.55 in MTLD, while the ChatGPT simulator achieves comparable results. This ultimately contributes to enhancing the system’s ability to generalize more effectively.

大语言模型(LLMs)因其遵循指令的能力而广受欢迎。在本研究中,我们评估了它们在模拟面向任务的对话(TOD)系统的用户交互方面的能力。我们的研究结果表明,提示 LLMs 在训练和测试对话策略方面显示出了很好的能力,无需像传统模拟器那样需要专业领域的知识来制定复杂的规则或依赖大量的注释数据。结果表明,使用 ChatGPT 模拟器训练的对话系统成功率为 59%,与使用人工规则、基于议程的用户模拟器(ABUS)训练的对话系统 62% 的成功率相当。此外,与使用 ABUS 训练的对话系统相比,使用 ChatGPT 模拟器训练的对话系统具有更好的泛化能力。在 GenTUS 上,它的成功率比用 ABUS 训练的对话系统高出 4%,在 ChatGPT 模拟器上高出 5%,在 Llama 模拟器上高出 3%。不过,基于 LLM 的用户模拟器提供了具有挑战性的环境、丰富的词汇、多样的随机回复。在所有词汇多样性指标上,Llama 模拟器都优于人类参考,SE 为 0.66,CE 为 0.39,MSTTR 为 0.01,HDD 为 0.04,MTLD 为 0.55,而 ChatGPT 模拟器的结果与之相当。这最终有助于增强系统更有效的泛化能力。
{"title":"Prompting large language models for user simulation in task-oriented dialogue systems","authors":"","doi":"10.1016/j.csl.2024.101697","DOIUrl":"10.1016/j.csl.2024.101697","url":null,"abstract":"<div><p>Large Language Models (LLMs) have gained widespread popularity due to their instruction-following abilities. In this study, we evaluate their ability in simulating user interactions for task-oriented dialogue (TOD) systems. Our findings demonstrate that prompting LLMs reveals their promising capabilities for training and testing dialogue policies, eliminating the need for domain expertise in crafting complex rules or relying on large annotated data, as required by traditional simulators. The results show that the dialogue system trained with the ChatGPT simulator achieves a success rate of 59%, comparable to a 62% success rate of the dialogue system trained with the manual-rules, agenda-based user simulator (ABUS). Furthermore, the dialogue system trained with the ChatGPT simulator demonstrates better generalization ability compared to the dialogue system trained with the ABUS. Its success rate outperforms that of the dialogue system trained with the ABUS by 4% on GenTUS, 5% on the ChatGPT Simulator, and 3% on the Llama simulator. Nevertheless, LLM-based user simulators provide challenging environment, lexically rich, diverse, and random responses. Llama simulator outperforms the human reference in all lexical diversity metrics with a margin of 0.66 in SE, 0.39 in CE, 0.01 in MSTTR, 0.04 in HDD, and 0.55 in MTLD, while the ChatGPT simulator achieves comparable results. This ultimately contributes to enhancing the system’s ability to generalize more effectively.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000809/pdfft?md5=81b644a0e6ced84bc9ba93092c2f49b3&pid=1-s2.0-S0885230824000809-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying large language models in second language development research 解密第二语言发展研究中的大型语言模型
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1016/j.csl.2024.101700

Evaluating students' textual response is a common and critical task in language research and education practice. However, manual assessment can be tedious and may lack consistency, posing challenges for both scientific discovery and frontline teaching. Leveraging state-of-the-art large language models (LLMs), we aim to define and operationalize LLM-Surprisal, a numeric representation of the interplay between lexical diversity and syntactic complexity, and to empirically and theoretically demonstrate its relevance for automatic writing assessment and Chinese L2 (second language) learners’ English writing development. We developed an LLM-based natural language processing pipeline that can automatically compute text Surprisal scores. By comparing Surprisal metrics with the widely used classic indices in L2 studies, we extended the usage of computational metrics in Chinese learners’ L2 English writing. Our analyses suggested that LLM-Surprisals can distinguish L2 from L1 (first language) writing, index L2 development stages, and predict scores provided by human professionals. This indicated that the Surprisal dimension may manifest itself as critical aspects in L2 development. The relative advantages and disadvantages of these approaches were discussed in depth. We concluded that LLMs are promising tools that can enhance L2 research. Our showcase paves the way for more nuanced approaches to computationally assessing and understanding L2 development. Our pipelines and findings will inspire language teachers, learners, and researchers to operationalize LLMs in an innovative and accessible manner.

在语言研究和教育实践中,评估学生对文本的反应是一项常见而重要的任务。然而,人工评估既繁琐又缺乏一致性,给科学发现和一线教学都带来了挑战。利用最先进的大语言模型(LLM),我们旨在定义和操作 LLM-Surprisal(词法多样性和句法复杂性之间相互作用的数字表示),并从经验和理论上证明其对自动写作评估和中国 L2(第二语言)学习者英语写作发展的相关性。我们开发了一个基于 LLM 的自然语言处理管道,可以自动计算文本 Surprisal 分数。通过将 Surprisal 指标与 L2 研究中广泛使用的经典指标进行比较,我们扩展了计算指标在中国学习者 L2 英语写作中的应用。我们的分析表明,LLM-Surprisals 可以区分 L2 和 L1(第一语言)写作,为 L2 发展阶段提供指数,并预测人类专业人员提供的分数。这表明,惊奇维度可能是 L2 发展的关键因素。我们深入讨论了这些方法的相对优缺点。我们的结论是,LLMs 是一种很有前途的工具,可以促进 L2 研究。我们的展示为通过计算评估和理解 L2 发展的更细致方法铺平了道路。我们的管道和研究成果将激励语言教师、学习者和研究人员以创新和易用的方式将 LLMs 付诸实施。
{"title":"Demystifying large language models in second language development research","authors":"","doi":"10.1016/j.csl.2024.101700","DOIUrl":"10.1016/j.csl.2024.101700","url":null,"abstract":"<div><p>Evaluating students' textual response is a common and critical task in language research and education practice. However, manual assessment can be tedious and may lack consistency, posing challenges for both scientific discovery and frontline teaching. Leveraging state-of-the-art large language models (LLMs), we aim to define and operationalize LLM-Surprisal, a numeric representation of the interplay between lexical diversity and syntactic complexity, and to empirically and theoretically demonstrate its relevance for automatic writing assessment and Chinese L2 (second language) learners’ English writing development. We developed an LLM-based natural language processing pipeline that can automatically compute text Surprisal scores. By comparing Surprisal metrics with the widely used classic indices in L2 studies, we extended the usage of computational metrics in Chinese learners’ L2 English writing. Our analyses suggested that LLM-Surprisals can distinguish L2 from L1 (first language) writing, index L2 development stages, and predict scores provided by human professionals. This indicated that the Surprisal dimension may manifest itself as critical aspects in L2 development. The relative advantages and disadvantages of these approaches were discussed in depth. We concluded that LLMs are promising tools that can enhance L2 research. Our showcase paves the way for more nuanced approaches to computationally assessing and understanding L2 development. Our pipelines and findings will inspire language teachers, learners, and researchers to operationalize LLMs in an innovative and accessible manner.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000834/pdfft?md5=88083b1a8544dcbd7f01cce3a7d527d7&pid=1-s2.0-S0885230824000834-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141843458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of preference elicitation methods on the user experience in conversational recommender systems 偏好激发方法对对话式推荐系统用户体验的影响
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101696

The prevalence of conversational interfaces is rapidly rising, since improved algorithms allow for remarkable proficiency in understanding and generating natural language. This also holds for Conversational Recommender Systems (CRS), that benefit from information being provided by the user in the course of the dialogue to offer personalized recommendations. However, the challenge remains eliciting the user's characteristics and preferences in a way that leads to the most optimal user experience. Hence, the current research was aimed at investigating the effect of different Preference Elicitation (PE) methods on the user experience of a CRS. We introduce two axes across which PE methods can be classified, namely the degree of system prompt guidance and the level of user input restriction. We built three versions of a CRS to conduct a between-subjects experiment which compared three conditions: high guidance-high restriction, high guidance-low restriction and low guidance-low restriction. We tested their effect on ten constructs of user experience measures on 66 European participants, all working in agriculture or forestry.

The study did not find any significant effects of the three preference elicitation methods on all user experience constructs collected through questionnaires. However, we did find significant differences in terms of the objective measures chat duration (Speed), response time (Cognitive Demand) and recommendation performance (Accuracy of Recommended Items). Regarding the recommendation performance, it was found that the preference elicitation methods with high guidance led to a higher match score than the condition with low guidance. The certainty score was highest in the condition with high guidance and high input restriction. Finally, we found through a question at the end of the conversation that users who were satisfied with the recommendation responded more positively to six out of ten user experience constructs. This suggests that satisfaction with the recommendation performance is a crucial factor in the user experience of CRSs.

会话界面的普及率正在迅速上升,因为经过改进的算法可以非常熟练地理解和生成自然语言。对话推荐系统(CRS)也是如此,该系统利用用户在对话过程中提供的信息来提供个性化推荐。然而,如何获取用户的特征和偏好,从而带来最佳的用户体验,仍然是一项挑战。因此,目前的研究旨在调查不同的偏好激发(PE)方法对 CRS 用户体验的影响。我们引入了两个轴来对 PE 方法进行分类,即系统提示引导的程度和用户输入限制的程度。我们制作了三个版本的 CRS,进行了主体间实验,比较了三种情况:高引导-高限制、高引导-低限制和低引导-低限制。我们在 66 名欧洲参与者(均从事农业或林业工作)身上测试了这三种方法对十项用户体验指标的影响。研究没有发现三种偏好激发方法对通过问卷收集的所有用户体验指标有任何显著影响。不过,我们确实发现在客观测量聊天持续时间(速度)、响应时间(认知需求)和推荐性能(推荐项目的准确性)方面存在明显差异。在推荐性能方面,我们发现高引导性的偏好激发方法比低引导性的条件下匹配得分更高。高指导性和高输入限制条件下的确定性得分最高。最后,我们通过对话结束时的一个问题发现,对推荐感到满意的用户对十个用户体验构面中的六个作出了更积极的回应。这表明,对推荐性能的满意度是 CRS 用户体验的一个关键因素。
{"title":"The effect of preference elicitation methods on the user experience in conversational recommender systems","authors":"","doi":"10.1016/j.csl.2024.101696","DOIUrl":"10.1016/j.csl.2024.101696","url":null,"abstract":"<div><p>The prevalence of conversational interfaces is rapidly rising, since improved algorithms allow for remarkable proficiency in understanding and generating natural language. This also holds for Conversational Recommender Systems (CRS), that benefit from information being provided by the user in the course of the dialogue to offer personalized recommendations. However, the challenge remains eliciting the user's characteristics and preferences in a way that leads to the most optimal user experience. Hence, the current research was aimed at investigating the effect of different Preference Elicitation (PE) methods on the user experience of a CRS. We introduce two axes across which PE methods can be classified, namely the degree of system prompt guidance and the level of user input restriction. We built three versions of a CRS to conduct a between-subjects experiment which compared three conditions: high guidance-high restriction, high guidance-low restriction and low guidance-low restriction. We tested their effect on ten constructs of user experience measures on 66 European participants, all working in agriculture or forestry.</p><p>The study did not find any significant effects of the three preference elicitation methods on all user experience constructs collected through questionnaires. However, we did find significant differences in terms of the objective measures chat duration (Speed), response time (Cognitive Demand) and recommendation performance (Accuracy of Recommended Items). Regarding the recommendation performance, it was found that the preference elicitation methods with high guidance led to a higher match score than the condition with low guidance. The certainty score was highest in the condition with high guidance and high input restriction. Finally, we found through a question at the end of the conversation that users who were satisfied with the recommendation responded more positively to six out of ten user experience constructs. This suggests that satisfaction with the recommendation performance is a crucial factor in the user experience of CRSs.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000792/pdfft?md5=2468411a22f6c0a2ba9f84281b96dacc&pid=1-s2.0-S0885230824000792-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Theory of mind performance of large language models: A comparative analysis of Turkish and English 大型语言模型的思维理论性能:土耳其语和英语的比较分析
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101698

Theory of mind (ToM), understanding others’ mental states, is a defining skill belonging to humans. Research assessing LLMs’ ToM performance yields conflicting findings and leads to discussions about whether and how they could show ToM understanding. Psychological research indicates that the characteristics of a specific language can influence how mental states are represented and communicated. Thus, it is reasonable to expect language characteristics to influence how LLMs communicate with humans, especially when the conversation involves references to mental states. This study examines how these characteristics affect LLMs’ ToM performance by evaluating GPT 3.5 and 4 performances in English and Turkish. Turkish provides an excellent contrast to English since Turkish has a different syntactic structure and special verbs, san- and zannet-, meaning “falsely believe.” Using Open AI's Chat Completion API, we collected responses from GPT models for first- and second-order ToM scenarios in English and Turkish. Our innovative approach combined completion prompts and open-ended questions within the same chat session, offering deep insights into models’ reasoning processes. Our data showed that while GPT models can respond accurately to standard ToM tasks (100% accuracy), their performance deteriorates (below chance level) with slight modifications. This high sensitivity suggests a lack of robustness in ToM performance. GPT 4 outperformed its predecessor, GPT 3.5, showing improvement in ToM performance to some extent. The models generally performed better when tasks were presented in English than in Turkish. These findings indicate that GPT models cannot reliably pass first-order and second-order ToM tasks in either of the languages yet. The findings have significant implications for Explainability of LLMs by highlighting challenges and biases that they face when simulating human-like ToM understanding in different languages.

心智理论(ToM),即理解他人的心理状态,是人类的一项决定性技能。评估本地语言学习者心智理论表现的研究得出了相互矛盾的结论,并引发了关于他们是否以及如何表现出心智理论理解能力的讨论。心理学研究表明,特定语言的特点会影响心理状态的表达和交流方式。因此,我们有理由相信,语言特点会影响 LLM 与人类交流的方式,尤其是当对话涉及到心理状态时。本研究通过评估 GPT 3.5 和 4 在英语和土耳其语中的表现,探讨了这些语言特点如何影响本地语言学家的 ToM 表现。土耳其语与英语形成了很好的对比,因为土耳其语具有不同的句法结构和特殊动词 san- 和 zannet-,意为 "虚假地相信"。我们使用 Open AI 的聊天完成 API,收集了 GPT 模型在英语和土耳其语的一阶和二阶 ToM 场景中的反应。我们的创新方法在同一聊天会话中结合了完成提示和开放式问题,从而深入了解了模型的推理过程。我们的数据显示,虽然 GPT 模型可以准确地响应标准 ToM 任务(准确率为 100%),但只要稍加修改,其性能就会下降(低于偶然水平)。这种高敏感性表明 ToM 性能缺乏稳健性。GPT 4 的表现优于其前身 GPT 3.5,在一定程度上提高了 ToM 性能。当任务以英语呈现时,模型的表现普遍优于以土耳其语呈现时。这些发现表明,GPT 模型还不能可靠地通过两种语言中的一阶和二阶 ToM 任务。这些发现对 LLM 的可解释性具有重要意义,因为它们强调了 LLM 在不同语言中模拟类人 ToM 理解时所面临的挑战和偏差。
{"title":"Theory of mind performance of large language models: A comparative analysis of Turkish and English","authors":"","doi":"10.1016/j.csl.2024.101698","DOIUrl":"10.1016/j.csl.2024.101698","url":null,"abstract":"<div><p>Theory of mind (ToM), understanding others’ mental states, is a defining skill belonging to humans. Research assessing LLMs’ ToM performance yields conflicting findings and leads to discussions about whether and how they could show ToM understanding. Psychological research indicates that the characteristics of a specific language can influence how mental states are represented and communicated. Thus, it is reasonable to expect language characteristics to influence how LLMs communicate with humans, especially when the conversation involves references to mental states. This study examines how these characteristics affect LLMs’ ToM performance by evaluating GPT 3.5 and 4 performances in English and Turkish. Turkish provides an excellent contrast to English since Turkish has a different syntactic structure and special verbs, san- and zannet-, meaning “falsely believe.” Using Open AI's Chat Completion API, we collected responses from GPT models for first- and second-order ToM scenarios in English and Turkish. Our innovative approach combined completion prompts and open-ended questions within the same chat session, offering deep insights into models’ reasoning processes. Our data showed that while GPT models can respond accurately to standard ToM tasks (100% accuracy), their performance deteriorates (below chance level) with slight modifications. This high sensitivity suggests a lack of robustness in ToM performance. GPT 4 outperformed its predecessor, GPT 3.5, showing improvement in ToM performance to some extent. The models generally performed better when tasks were presented in English than in Turkish. These findings indicate that GPT models cannot reliably pass first-order and second-order ToM tasks in either of the languages yet. The findings have significant implications for <em>Explainability</em> of LLMs by highlighting challenges and biases that they face when simulating human-like ToM understanding in different languages.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000810/pdfft?md5=e4a1b003e652ef2e0a652d3d4eaf2c3d&pid=1-s2.0-S0885230824000810-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChatMatch: Exploring the potential of hybrid vision–language deep learning approach for the intelligent analysis and inference of racket sports ChatMatch:探索视觉-语言混合深度学习方法在球拍类运动智能分析和推理中的潜力
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-25 DOI: 10.1016/j.csl.2024.101694

Video understanding technology has become increasingly important in various disciplines, yet current approaches have primarily focused on lower comprehension level of video content, posing challenges for providing comprehensive and professional insights at a higher comprehension level. Video analysis plays a crucial role in athlete training and strategy development in racket sports. This study aims to demonstrate an innovative and higher-level video comprehension framework (ChatMatch), which integrates computer vision technologies with the cutting-edge large language models (LLM) to enable intelligent analysis and inference of racket sports videos. To examine the feasibility of this framework, we deployed a prototype of ChatMatch in the badminton in this study. A vision-based encoder was first proposed to extract the meta-features included the locations, actions, gestures, and action results of players in each frame of racket match videos, followed by a rule-based decoding method to transform the extracted information in both structured knowledge and unstructured knowledge. A set of LLM-based agents included namely task identifier, coach agent, statistician agent, and video manager, was developed through a prompt engineering and driven by an automated mechanism. The automatic collaborative interaction among the agents enabled the provision of a comprehensive response to professional inquiries from users. The validation findings showed that our vision models had excellent performances in meta-feature extraction, achieving a location identification accuracy of 0.991, an action recognition accuracy of 0.902, and a gesture recognition accuracy of 0.950. Additionally, a total of 100 questions were gathered from four proficient badminton players and one coach to evaluate the performance of the LLM-based agents, and the outcomes obtained from ChatMatch exhibited commendable results across general inquiries, statistical queries, and video retrieval tasks. These findings highlight the potential of using this approach that can offer valuable insights for athletes and coaches while significantly improve the efficiency of sports video analysis.

视频理解技术在各学科中的重要性与日俱增,但目前的方法主要集中在较低理解水平的视频内容上,为在较高理解水平上提供全面、专业的见解带来了挑战。视频分析在球拍类运动的运动员训练和策略制定中发挥着至关重要的作用。本研究旨在展示一个创新的、更高层次的视频理解框架(ChatMatch),该框架将计算机视觉技术与前沿的大型语言模型(LLM)相结合,实现了对球拍类运动视频的智能分析和推理。为了检验该框架的可行性,我们在羽毛球比赛中部署了 ChatMatch 的原型。首先,我们提出了一种基于视觉的编码器来提取元特征,包括球拍比赛视频中每一帧中球员的位置、动作、手势和动作结果,然后采用基于规则的解码方法将提取的信息转换为结构化知识和非结构化知识。通过提示工程和自动机制的驱动,开发了一套基于 LLM 的代理,包括任务识别器、教练代理、统计代理和视频管理器。这些代理之间的自动协作互动能够对用户的专业咨询做出全面回应。验证结果表明,我们的视觉模型在元特征提取方面表现出色,位置识别准确率达到 0.991,动作识别准确率达到 0.902,手势识别准确率达到 0.950。此外,为了评估基于 LLM 的代理的性能,我们还从四名羽毛球高手和一名教练那里收集了 100 个问题,结果显示 ChatMatch 在一般查询、统计查询和视频检索任务方面都取得了令人称道的成绩。这些发现凸显了使用这种方法的潜力,它可以为运动员和教练员提供有价值的见解,同时显著提高体育视频分析的效率。
{"title":"ChatMatch: Exploring the potential of hybrid vision–language deep learning approach for the intelligent analysis and inference of racket sports","authors":"","doi":"10.1016/j.csl.2024.101694","DOIUrl":"10.1016/j.csl.2024.101694","url":null,"abstract":"<div><p>Video understanding technology has become increasingly important in various disciplines, yet current approaches have primarily focused on lower comprehension level of video content, posing challenges for providing comprehensive and professional insights at a higher comprehension level. Video analysis plays a crucial role in athlete training and strategy development in racket sports. This study aims to demonstrate an innovative and higher-level video comprehension framework (ChatMatch), which integrates computer vision technologies with the cutting-edge large language models (LLM) to enable intelligent analysis and inference of racket sports videos. To examine the feasibility of this framework, we deployed a prototype of ChatMatch in the badminton in this study. A vision-based encoder was first proposed to extract the meta-features included the locations, actions, gestures, and action results of players in each frame of racket match videos, followed by a rule-based decoding method to transform the extracted information in both structured knowledge and unstructured knowledge. A set of LLM-based agents included namely task identifier, coach agent, statistician agent, and video manager, was developed through a prompt engineering and driven by an automated mechanism. The automatic collaborative interaction among the agents enabled the provision of a comprehensive response to professional inquiries from users. The validation findings showed that our vision models had excellent performances in meta-feature extraction, achieving a location identification accuracy of 0.991, an action recognition accuracy of 0.902, and a gesture recognition accuracy of 0.950. Additionally, a total of 100 questions were gathered from four proficient badminton players and one coach to evaluate the performance of the LLM-based agents, and the outcomes obtained from ChatMatch exhibited commendable results across general inquiries, statistical queries, and video retrieval tasks. These findings highlight the potential of using this approach that can offer valuable insights for athletes and coaches while significantly improve the efficiency of sports video analysis.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000779/pdfft?md5=2c72701b559ac872232548320e08722b&pid=1-s2.0-S0885230824000779-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1