Automating the Information Extraction from Semi-Structured Interview Transcripts

ArXiv Pub Date : 2024-03-07 DOI:10.1145/3589335.3651230

Angelina Parfenova

引用次数: 0

Abstract

This paper explores the development and application of an automated system designed to extract information from semi-structured interview transcripts. Given the labor-intensive nature of traditional qualitative analysis methods, such as coding, there exists a significant demand for tools that can facilitate the analysis process. Our research investigates various topic modeling techniques and concludes that the best model for analyzing interview texts is a combination of BERT embeddings and HDBSCAN clustering. We present a user-friendly software prototype that enables researchers, including those without programming skills, to efficiently process and visualize the thematic structure of interview data. This tool not only facilitates the initial stages of qualitative analysis but also offers insights into the interconnectedness of topics revealed, thereby enhancing the depth of qualitative analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从半结构化访谈记录中自动提取信息

本文探讨了从半结构化访谈记录中提取信息的自动化系统的开发和应用。鉴于传统定性分析方法（如编码）的劳动密集性质，人们对能够促进分析过程的工具有很大的需求。我们的研究调查了各种主题建模技术，得出的结论是，分析访谈文本的最佳模型是 BERT 嵌入和 HDBSCAN 聚类的组合。我们介绍了一个用户友好型软件原型，它能让研究人员（包括没有编程技能的研究人员）高效地处理访谈数据的主题结构并将其可视化。该工具不仅有助于定性分析的初始阶段，还能让人深入了解所揭示的主题之间的相互联系，从而提高定性分析的深度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ArXiv

自引率

0.00%

发文量