Facilitating topic modeling in tourism research:Comprehensive comparison of new AI technologies

IF 12.4 1区管理学 Q1 ENVIRONMENTAL STUDIES Tourism Management Pub Date : 2024-07-31 DOI:10.1016/j.tourman.2024.105007

Andrei P. Kirilenko, Svetlana Stepchenkova

{"title":"Facilitating topic modeling in tourism research:Comprehensive comparison of new AI technologies","authors":"Andrei P. Kirilenko, Svetlana Stepchenkova","doi":"10.1016/j.tourman.2024.105007","DOIUrl":null,"url":null,"abstract":"<div><p>In the past few years, a new crop of transformer-based language models such as Google's BERT and OpenAI's ChatGPT has become increasingly popular in text analysis, owing their success to their ability to capture the entire document's context. These new methods, however, have yet to percolate into tourism academic literature. This paper aims to fill in this gap by providing a comparative analysis of these instruments against the commonly used Latent Dirichlet Allocation for topic extraction of contrasting tourism-related data: coherent vs. noisy, short vs. long, and small vs. large corpus size. The data are typical of tourism literature and include comments of followers of a popular blogger, TripAdvisor reviews, and review titles. We provide recommendations of data domains where the review methods demonstrate the best performance, consider success dimensions, and discuss each method's strong and weak sides. In general, GPT tends to return comprehensive, highly interpretable, and relevant to the real-world topics for all datasets, including the noisy ones, and at all scales. Meanwhile, ChatGPT is the most vulnerable to the issue of trust common to the “black box” model, which we explore in detail.</p></div>","PeriodicalId":48469,"journal":{"name":"Tourism Management","volume":"106 ","pages":"Article 105007"},"PeriodicalIF":12.4000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0261517724001262/pdfft?md5=9cf4802a7c4ae0637a244cf2391ccfb1&pid=1-s2.0-S0261517724001262-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tourism Management","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0261517724001262","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}

引用次数: 0

Abstract

In the past few years, a new crop of transformer-based language models such as Google's BERT and OpenAI's ChatGPT has become increasingly popular in text analysis, owing their success to their ability to capture the entire document's context. These new methods, however, have yet to percolate into tourism academic literature. This paper aims to fill in this gap by providing a comparative analysis of these instruments against the commonly used Latent Dirichlet Allocation for topic extraction of contrasting tourism-related data: coherent vs. noisy, short vs. long, and small vs. large corpus size. The data are typical of tourism literature and include comments of followers of a popular blogger, TripAdvisor reviews, and review titles. We provide recommendations of data domains where the review methods demonstrate the best performance, consider success dimensions, and discuss each method's strong and weak sides. In general, GPT tends to return comprehensive, highly interpretable, and relevant to the real-world topics for all datasets, including the noisy ones, and at all scales. Meanwhile, ChatGPT is the most vulnerable to the issue of trust common to the “black box” model, which we explore in detail.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

促进旅游研究中的主题建模：人工智能新技术的综合比较

在过去几年中，谷歌的 BERT 和 OpenAI 的 ChatGPT 等新一批基于转换器的语言模型在文本分析领域越来越受欢迎，其成功之处在于它们能够捕捉整个文档的上下文。然而，这些新方法尚未渗透到旅游学术文献中。本文旨在填补这一空白，将这些方法与常用的 Latent Dirichlet Allocation 进行对比分析，以提取旅游相关数据的主题：连贯与嘈杂、短与长、小语料库与大语料库。这些数据是典型的旅游文献，包括一位热门博主的粉丝评论、TripAdvisor 评论和评论标题。我们推荐了评论方法表现最佳的数据域，考虑了成功维度，并讨论了每种方法的强项和弱项。总的来说，对于所有数据集（包括噪声数据集）和所有尺度的数据集，GPT 都倾向于返回全面、高度可解释且与现实世界主题相关的结果。同时，ChatGPT 最容易受到 "黑箱 "模型常见的信任问题的影响，我们将对此进行详细探讨。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Tourism Management Multiple-

CiteScore

24.10

自引率

7.90%

发文量

190

审稿时长

45 days

期刊介绍： Tourism Management, the preeminent scholarly journal, concentrates on the comprehensive management aspects, encompassing planning and policy, within the realm of travel and tourism. Adopting an interdisciplinary perspective, the journal delves into international, national, and regional tourism, addressing various management challenges. Its content mirrors this integrative approach, featuring primary research articles, progress in tourism research, case studies, research notes, discussions on current issues, and book reviews. Emphasizing scholarly rigor, all published papers are expected to contribute to theoretical and/or methodological advancements while offering specific insights relevant to tourism management and policy.