Atiya Usmani , Saeed Hamood Alsamhi , Muhammad Jaleed Khan , John Breslin , Edward Curry
{"title":"MuSe-CarASTE: A comprehensive dataset for aspect sentiment triplet extraction in automotive review videos","authors":"Atiya Usmani , Saeed Hamood Alsamhi , Muhammad Jaleed Khan , John Breslin , Edward Curry","doi":"10.1016/j.eswa.2024.125695","DOIUrl":null,"url":null,"abstract":"<div><div>In the Aspect-Based Sentiment Analysis (ABSA) domain, the Aspect Sentiment Triplet Extraction (ASTE) task has emerged as a pivotal endeavor, offering insights into nuanced aspects, opinions, and sentiment relationships. This paper introduces “MuSe-CarASTE”, an extensive and meticulously curated dataset purpose-built to propel ASTE advancements within the automotive domain. The core emphasis of MuSe-CarASTE is on aspect, opinion, and sentiment triplets, facilitating a comprehensive analysis of product reviews. Comprising transcripts from MuSe-Car’s automotive video reviews, MuSe-CarASTE presents a sub-stantial collection of nearly 28,295 sentences organized into 5,500 segments. Each segment is meticulously annotated with multiple aspects, opinions, and sentiment labels, offering unprecedented granularity for ASTE tasks. The percentage agreement between annotated triples by different annotators over the randomly sampled subset of the dataset is 79.74 %, at similarity threshold <em>τ</em> = 0.60. We also experimented with four baseline models on our datset and report results. The distinctiveness of the dataset emerges from its extension into the automotive domain, shedding light on sentiment dynamics specific to vehicles. With the fusion of extensive content and real-world applicability, MuSe-CarASTE presents a fertile ground for Natural Language Processing (NLP) innovation. Researchers, practitioners, and data scientists can harness MuSe-CarASTE to build and evaluate NLP models tailored for challenges in ASTE. These challenges encompass intricate aspect-opinion relationships, multi-word aspect and opinion extraction, and the subtleties of vague language. Moreover, including aspects not verbatim in sentences introduces a practical dimension to our dataset, enabling real-world applications like review pattern analysis, summarization, and recommender system enhancement. As a pioneering benchmark for NLP model evaluation in ABSA, MuSe-CarASTE integrates content richness, real-world context, and sentiment complexity. The integration empowers the development of accurate, adaptable, and insightful sentiment analysis models within the automotive review landscape.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125695"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025624","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the Aspect-Based Sentiment Analysis (ABSA) domain, the Aspect Sentiment Triplet Extraction (ASTE) task has emerged as a pivotal endeavor, offering insights into nuanced aspects, opinions, and sentiment relationships. This paper introduces “MuSe-CarASTE”, an extensive and meticulously curated dataset purpose-built to propel ASTE advancements within the automotive domain. The core emphasis of MuSe-CarASTE is on aspect, opinion, and sentiment triplets, facilitating a comprehensive analysis of product reviews. Comprising transcripts from MuSe-Car’s automotive video reviews, MuSe-CarASTE presents a sub-stantial collection of nearly 28,295 sentences organized into 5,500 segments. Each segment is meticulously annotated with multiple aspects, opinions, and sentiment labels, offering unprecedented granularity for ASTE tasks. The percentage agreement between annotated triples by different annotators over the randomly sampled subset of the dataset is 79.74 %, at similarity threshold τ = 0.60. We also experimented with four baseline models on our datset and report results. The distinctiveness of the dataset emerges from its extension into the automotive domain, shedding light on sentiment dynamics specific to vehicles. With the fusion of extensive content and real-world applicability, MuSe-CarASTE presents a fertile ground for Natural Language Processing (NLP) innovation. Researchers, practitioners, and data scientists can harness MuSe-CarASTE to build and evaluate NLP models tailored for challenges in ASTE. These challenges encompass intricate aspect-opinion relationships, multi-word aspect and opinion extraction, and the subtleties of vague language. Moreover, including aspects not verbatim in sentences introduces a practical dimension to our dataset, enabling real-world applications like review pattern analysis, summarization, and recommender system enhancement. As a pioneering benchmark for NLP model evaluation in ABSA, MuSe-CarASTE integrates content richness, real-world context, and sentiment complexity. The integration empowers the development of accurate, adaptable, and insightful sentiment analysis models within the automotive review landscape.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.