Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles

A. Wasielewski, A. Dahlgren
{"title":"Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles","authors":"A. Wasielewski, A. Dahlgren","doi":"10.16993/BBK.L","DOIUrl":null,"url":null,"abstract":"Text mining in art history scholarship can tell us about the discipline itself, as well as artistic concerns at any given moment. The aim of this study is to develop and test a strategy for text mining from PDFs of journal articles that have nonstandard formatting and/or use notes rather than full bibliographies for references. While articles in the natural and social sciences typically adhere to standard formats, art history journals employ a variety of formatting styles that make bulk capture of citation and other textual data from the articles challenging. This study outlines a method by which researchers can extract data from journals articles, using a sample set from art history. Once extracted, the data from PDFs can be used to compare frequently used terms across samples and determine which scholars are most cited in either bibliographies or the main body text of articles. If the structure and layout of individual journals are carefully considered and the data is properly cleaned, a clear picture of the disciplinary influences and dependencies of the scholarship through citations and key terms can be obtained.","PeriodicalId":332163,"journal":{"name":"Digital Human Sciences: New Objects – New Approaches","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Human Sciences: New Objects – New Approaches","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16993/BBK.L","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text mining in art history scholarship can tell us about the discipline itself, as well as artistic concerns at any given moment. The aim of this study is to develop and test a strategy for text mining from PDFs of journal articles that have nonstandard formatting and/or use notes rather than full bibliographies for references. While articles in the natural and social sciences typically adhere to standard formats, art history journals employ a variety of formatting styles that make bulk capture of citation and other textual data from the articles challenging. This study outlines a method by which researchers can extract data from journals articles, using a sample set from art history. Once extracted, the data from PDFs can be used to compare frequently used terms across samples and determine which scholars are most cited in either bibliographies or the main body text of articles. If the structure and layout of individual journals are carefully considered and the data is properly cleaned, a clear picture of the disciplinary influences and dependencies of the scholarship through citations and key terms can be obtained.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
挖掘艺术史:将非标准pdf文件批量转换为文本以确定人文学科文章中引文和关键术语的频率
艺术史研究中的文本挖掘可以告诉我们学科本身,以及任何特定时刻的艺术关注点。本研究的目的是开发和测试一种策略,用于从非标准格式和/或使用注释而不是完整参考书目的期刊文章的pdf中挖掘文本。虽然自然科学和社会科学的文章通常遵循标准格式,但艺术史期刊采用各种格式风格,这使得从文章中大量捕获引文和其他文本数据具有挑战性。本研究概述了一种方法,通过该方法,研究人员可以从期刊文章中提取数据,使用艺术史的样本集。一旦从pdf文件中提取数据,就可以用来比较样本中常用的术语,并确定哪些学者在参考书目或文章的主体文本中被引用最多。如果仔细考虑单个期刊的结构和布局,并对数据进行适当的清理,就可以通过引用和关键术语清楚地了解该奖学金的学科影响和依赖性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Revisiting the Human–Society–Technology Nexus: Intercultural Communication Studies as a Looking Glass for Scientific Self-Scrutiny in the Digital Human Sciences Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles The Growing Pains of Digital Art History: Issues for the Study of Art Using Computational Methods Legal AI from a Privacy Point of View: Data Protection and Transparency in Focus Interpreting Information Visualization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1