The evolving scholarly record: new uses and new forms

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering Pub Date : 2014-09-16 DOI:10.1145/2644866.2644900

C. Lynch

{"title":"The evolving scholarly record: new uses and new forms","authors":"C. Lynch","doi":"10.1145/2644866.2644900","DOIUrl":null,"url":null,"abstract":"This presentation will take a very broad view of the emergence of literary corpora as objects of computation, with a particular focus on the various literatures and genres that form the scholarly record. The developments and implications here that I will explore include: the evolution of the scholarly literature into a semi-structured network of information used by both human readers and computational agents through the introduction of markup technologies; the interpenetration and interweaving of data and evidence with the literature; and the creation of an invisible infrastructure of names, taxonomies and ontologies, and the challenges this presents.\n Primary forms of computation on this corpus include both comprehensive text mining and stream analysis (focused on what's new and what's changing as the base of literature and related factual databases expand with reports of new discoveries). I'll explore some of the developments in this area, including some practical considerations about platforms, licensing, and access.\n As the use of the literature evolves, so do the individual genres that comprise it. Today's typical digital journal article looks almost identical to one half a century old, except that it is viewed on screen and printed on demand. Yet there is a great deal of activity driven by the move to data and computationally intensive scholarship, demands for greater precision and replicability in scientific communication, and related sources to move journal articles \"beyond the PDF,\" reconsidering relationships among traditional texts, software, workflows, data and the broad cultural record in its role as evidence. I'll look briefly at some of these developments, with particular focus on what this may mean for the management of the scholarly record as a whole, and also briefly discuss some parallel challenges emerging in scholarly monographs.\n Finally, I will close with a very brief discussion of what might be called corpus-scale thinking with regard to the scholarly record at the disciplinary level. I'll briefly discuss the findings of a 2014 National Research Council study that I co-chaired dealing with the future of the mathematics literature and the possibility of creating a global digital mathematics library, as well as offering some comments on developments in the life sciences. I will also consider the emergence of new corpus-wide tools and standards, such as Web-scale annotation, and some of their implications.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"52 1","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2644866.2644900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This presentation will take a very broad view of the emergence of literary corpora as objects of computation, with a particular focus on the various literatures and genres that form the scholarly record. The developments and implications here that I will explore include: the evolution of the scholarly literature into a semi-structured network of information used by both human readers and computational agents through the introduction of markup technologies; the interpenetration and interweaving of data and evidence with the literature; and the creation of an invisible infrastructure of names, taxonomies and ontologies, and the challenges this presents. Primary forms of computation on this corpus include both comprehensive text mining and stream analysis (focused on what's new and what's changing as the base of literature and related factual databases expand with reports of new discoveries). I'll explore some of the developments in this area, including some practical considerations about platforms, licensing, and access. As the use of the literature evolves, so do the individual genres that comprise it. Today's typical digital journal article looks almost identical to one half a century old, except that it is viewed on screen and printed on demand. Yet there is a great deal of activity driven by the move to data and computationally intensive scholarship, demands for greater precision and replicability in scientific communication, and related sources to move journal articles "beyond the PDF," reconsidering relationships among traditional texts, software, workflows, data and the broad cultural record in its role as evidence. I'll look briefly at some of these developments, with particular focus on what this may mean for the management of the scholarly record as a whole, and also briefly discuss some parallel challenges emerging in scholarly monographs. Finally, I will close with a very brief discussion of what might be called corpus-scale thinking with regard to the scholarly record at the disciplinary level. I'll briefly discuss the findings of a 2014 National Research Council study that I co-chaired dealing with the future of the mathematics literature and the possibility of creating a global digital mathematics library, as well as offering some comments on developments in the life sciences. I will also consider the emergence of new corpus-wide tools and standards, such as Web-scale annotation, and some of their implications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不断发展的学术记录:新用法和新形式

这次演讲将从一个非常广泛的角度来看待作为计算对象的文学语料库的出现，特别关注形成学术记录的各种文学和体裁。我将探讨的发展和影响包括:通过引入标记技术，学术文献演变为人类读者和计算代理使用的半结构化信息网络;数据和证据与文献的相互渗透和交织;以及名称、分类法和本体的不可见基础设施的创建，以及由此带来的挑战。该语料库的主要计算形式包括全面的文本挖掘和流分析(随着文献基础和相关事实数据库随着新发现的报告而扩展，重点关注新的内容和正在变化的内容)。我将探讨这一领域的一些发展，包括关于平台、许可和访问的一些实际考虑。随着文学使用的演变，构成文学的各个体裁也在演变。今天典型的电子期刊文章看起来几乎和半个世纪前的一模一样，除了它是在屏幕上观看和按需印刷的。然而，由于向数据和计算密集型学术的转变，对科学交流中更高精度和可复制性的要求，以及将期刊文章“超越PDF”的相关来源，重新考虑传统文本、软件、工作流程、数据和作为证据的广泛文化记录之间的关系，推动了大量的活动。我将简要介绍其中的一些发展，特别关注这对整个学术记录的管理可能意味着什么，并简要讨论学术专著中出现的一些平行挑战。最后，我将以一个非常简短的讨论来结束，关于学科水平的学术记录，我们可以称之为语料库尺度思维。我将简要讨论2014年国家研究委员会(National Research Council)的一项研究的结果，该研究是我共同主持的，涉及数学文献的未来和创建全球数字数学图书馆的可能性，并对生命科学的发展提出一些评论。我还将考虑新的语料库范围的工具和标准的出现，例如web规模的注释，以及它们的一些含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering

自引率

0.00%

发文量

期刊最新文献

The Notarial Archives, Valletta: Starting from Zero Truncation: all the news that fits we'll print Classifying and ranking search engine results as potential sources of plagiarism An ensemble approach for text document clustering using Wikipedia concepts Document changes: modeling, detection, storage and visualization (DChanges 2014)