Identification of social scientifically relevant topics in an interview repository: a natural language processing experiment

IF 1.7 3区 管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Journal of Documentation Pub Date : 2023-10-13 DOI:10.1108/jd-12-2022-0269
Judit Gárdos, Julia Egyed-Gergely, Anna Horváth, Balázs Pataki, Roza Vajda, András Micsik
{"title":"Identification of social scientifically relevant topics in an interview repository: a natural language processing experiment","authors":"Judit Gárdos, Julia Egyed-Gergely, Anna Horváth, Balázs Pataki, Roza Vajda, András Micsik","doi":"10.1108/jd-12-2022-0269","DOIUrl":null,"url":null,"abstract":"Purpose The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for Social Sciences (TK KDK) in Budapest. It explores the use of artificial intelligence (AI) in producing, managing and processing social science data and its potential to generate useful metadata to describe the contents of such archives on a large scale. Design/methodology/approach The authors combined manual and automated/semi-automated methods of metadata development and curation. The authors developed a suitable domain-oriented taxonomy to classify a large text corpus of semi-structured interviews. To this end, the authors adapted the European Language Social Science Thesaurus (ELSST) to produce a concise, hierarchical structure of topics relevant in social sciences. The authors identified and tested the most promising natural language processing (NLP) tools supporting the Hungarian language. The results of manual and machine coding will be presented in a user interface. Findings The study describes how an international social scientific taxonomy can be adapted to a specific local setting and tailored to be used by automated NLP tools. The authors show the potential and limitations of existing and new NLP methods for thematic assignment. The current possibilities of multi-label classification in social scientific metadata assignment are discussed, i.e. the problem of automated selection of relevant labels from a large pool. Originality/value Interview materials have not yet been used for building manually annotated training datasets for automated indexing of scientifically relevant topics in a data repository. Comparing various automated-indexing methods, this study shows a possible implementation of a researcher tool supporting custom visualizations and the faceted search of interview collections.","PeriodicalId":47969,"journal":{"name":"Journal of Documentation","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Documentation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jd-12-2022-0269","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for Social Sciences (TK KDK) in Budapest. It explores the use of artificial intelligence (AI) in producing, managing and processing social science data and its potential to generate useful metadata to describe the contents of such archives on a large scale. Design/methodology/approach The authors combined manual and automated/semi-automated methods of metadata development and curation. The authors developed a suitable domain-oriented taxonomy to classify a large text corpus of semi-structured interviews. To this end, the authors adapted the European Language Social Science Thesaurus (ELSST) to produce a concise, hierarchical structure of topics relevant in social sciences. The authors identified and tested the most promising natural language processing (NLP) tools supporting the Hungarian language. The results of manual and machine coding will be presented in a user interface. Findings The study describes how an international social scientific taxonomy can be adapted to a specific local setting and tailored to be used by automated NLP tools. The authors show the potential and limitations of existing and new NLP methods for thematic assignment. The current possibilities of multi-label classification in social scientific metadata assignment are discussed, i.e. the problem of automated selection of relevant labels from a large pool. Originality/value Interview materials have not yet been used for building manually annotated training datasets for automated indexing of scientifically relevant topics in a data repository. Comparing various automated-indexing methods, this study shows a possible implementation of a researcher tool supporting custom visualizations and the faceted search of interview collections.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
访谈资料库中社会科学相关主题的识别:自然语言处理实验
本研究旨在生成元数据,以提高专题透明度,并促进布达佩斯社会科学中心(TK KDK)研究文献中心访谈集的研究。它探讨了人工智能(AI)在生产、管理和处理社会科学数据方面的应用,以及它产生有用的元数据以大规模描述此类档案内容的潜力。设计/方法论/方法作者结合了手工和自动化/半自动化的元数据开发和管理方法。作者开发了一个合适的面向领域的分类法来对半结构化访谈的大型文本语料库进行分类。为此,作者改编了欧洲语言社会科学同义词库(ELSST),以产生一个简洁的,层次结构的主题相关的社会科学。作者确定并测试了支持匈牙利语的最有前途的自然语言处理(NLP)工具。手工和机器编码的结果将在用户界面中呈现。该研究描述了国际社会科学分类法如何适应特定的当地环境,并为自动化NLP工具量身定制。作者展示了用于主题分配的现有和新的NLP方法的潜力和局限性。讨论了当前社会科学元数据分配中多标签分类的可能性,即从大池中自动选择相关标签的问题。独创性/价值访谈材料尚未用于构建人工注释的训练数据集,以便在数据存储库中自动索引科学相关主题。比较各种自动索引方法,本研究展示了一种可能实现的研究人员工具,支持自定义可视化和采访集合的分面搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Documentation
Journal of Documentation INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
4.20
自引率
14.30%
发文量
72
期刊介绍: The scope of the Journal of Documentation is broadly information sciences, encompassing all of the academic and professional disciplines which deal with recorded information. These include, but are certainly not limited to: ■Information science, librarianship and related disciplines ■Information and knowledge management ■Information and knowledge organisation ■Information seeking and retrieval, and human information behaviour ■Information and digital literacies
期刊最新文献
Information experiences of bonsai growers: a phenomenological study in serious leisure Information experiences of bonsai growers: a phenomenological study in serious leisure Constructing risk in trustworthy digital repositories Information seeking and communication model (ISCM): application and extension Evolving legitimacy of the public library in the 21st century
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1