首页 > 最新文献

arXiv - CS - Digital Libraries最新文献

英文 中文
Examining Different Research Communities: Authorship Network 研究不同的研究社区:作者网络
Pub Date : 2024-08-24 DOI: arxiv-2409.00081
Shrabani Ghosh
Google Scholar is one of the top search engines to access research articlesacross multiple disciplines for scholarly literature. Google scholar advancesearch option gives the privilege to extract articles based on phrases,publishers name, authors name, time duration etc. In this work, we collectedGoogle Scholar data (2000-2021) for two different research domains in computerscience: Data Mining and Software Engineering. The scholar database resourcesare powerful for network analysis, data mining, and identify links betweenauthors via authorship network. We examined coauthor-ship network for eachdomain and studied their network structure. Extensive experiments are performedto analyze publications trend and identifying influential authors andaffiliated organizations for each domain. The network analysis shows that thenetworks features are distinct from one another and exhibit small communitieswithin the influential authors of a particular domain.
Google Scholar 是访问跨学科学术文献研究文章的顶级搜索引擎之一。谷歌学者的高级搜索选项提供了根据短语、出版商名称、作者姓名、时间长度等提取文章的特权。在这项工作中,我们收集了计算机科学领域两个不同研究领域的谷歌学术数据(2000-2021 年):数据挖掘和软件工程。学者数据库资源具有强大的网络分析和数据挖掘功能,可通过作者关系网络识别作者之间的联系。我们检查了每个领域的合著者关系网络,并研究了它们的网络结构。我们进行了广泛的实验,以分析每个领域的论文发表趋势,并识别有影响力的作者和附属机构。网络分析结果表明,这些网络特征彼此不同,并在特定领域有影响力的作者中呈现出小社区的特征。
{"title":"Examining Different Research Communities: Authorship Network","authors":"Shrabani Ghosh","doi":"arxiv-2409.00081","DOIUrl":"https://doi.org/arxiv-2409.00081","url":null,"abstract":"Google Scholar is one of the top search engines to access research articles\u0000across multiple disciplines for scholarly literature. Google scholar advance\u0000search option gives the privilege to extract articles based on phrases,\u0000publishers name, authors name, time duration etc. In this work, we collected\u0000Google Scholar data (2000-2021) for two different research domains in computer\u0000science: Data Mining and Software Engineering. The scholar database resources\u0000are powerful for network analysis, data mining, and identify links between\u0000authors via authorship network. We examined coauthor-ship network for each\u0000domain and studied their network structure. Extensive experiments are performed\u0000to analyze publications trend and identifying influential authors and\u0000affiliated organizations for each domain. The network analysis shows that the\u0000networks features are distinct from one another and exhibit small communities\u0000within the influential authors of a particular domain.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LCA and energy efficiency in buildings: mapping more than twenty years of research 生命周期评估与建筑能效:二十多年的研究图谱
Pub Date : 2024-08-23 DOI: arxiv-2409.00065
F. Asdrubali, A. Fronzetti Colladon, L. Segneri, D. M. Gandola
Research on Life Cycle Assessment (LCA) is being conducted in varioussectors, from analyzing building materials and components to comprehensiveevaluations of entire structures. However, reviews of the existing literaturehave been unable to provide a comprehensive overview of research in this field,leaving scholars without a definitive guideline for future investigations. Thispaper aims to fill this gap, mapping more than twenty years of research. Usingan innovative methodology that combines social network analysis and textmining, the paper examined 8024 scientific abstracts. The authors identifiedseven key thematic groups, building and sustainability clusters (BSCs). Toassess their significance in the broader discourse on building andsustainability, the semantic brand score (SBS) indicator was applied.Additionally, building and sustainability trends were tracked, focusing on theLCA concept. The major research topics mainly relate to building materials andenergy efficiency. In addition to presenting an innovative approach toreviewing extensive literature domains, the article also provides insights intoemerging and underdeveloped themes, outlining crucial future researchdirections.
各行各业都在开展生命周期评估(LCA)研究,从分析建筑材料和组件到对整个结构进行综合评估。然而,对现有文献的综述无法全面概述该领域的研究,使学者们对未来的研究缺乏明确的指导。本文旨在填补这一空白,对二十多年来的研究进行梳理。本文采用创新方法,结合社交网络分析和文本挖掘,研究了 8024 篇科学文摘。作者确定了七个关键主题组,即建筑与可持续发展集群(BSCs)。为了评估它们在更广泛的建筑与可持续发展讨论中的重要性,作者采用了语义品牌得分(SBS)指标。此外,作者还跟踪了建筑与可持续发展的趋势,重点关注了LCA概念。主要研究课题主要涉及建筑材料和能源效率。除了提出一种创新的方法来审查广泛的文献领域外,文章还对新出现的和未充分开发的主题提出了见解,并概述了未来的重要研究方向。
{"title":"LCA and energy efficiency in buildings: mapping more than twenty years of research","authors":"F. Asdrubali, A. Fronzetti Colladon, L. Segneri, D. M. Gandola","doi":"arxiv-2409.00065","DOIUrl":"https://doi.org/arxiv-2409.00065","url":null,"abstract":"Research on Life Cycle Assessment (LCA) is being conducted in various\u0000sectors, from analyzing building materials and components to comprehensive\u0000evaluations of entire structures. However, reviews of the existing literature\u0000have been unable to provide a comprehensive overview of research in this field,\u0000leaving scholars without a definitive guideline for future investigations. This\u0000paper aims to fill this gap, mapping more than twenty years of research. Using\u0000an innovative methodology that combines social network analysis and text\u0000mining, the paper examined 8024 scientific abstracts. The authors identified\u0000seven key thematic groups, building and sustainability clusters (BSCs). To\u0000assess their significance in the broader discourse on building and\u0000sustainability, the semantic brand score (SBS) indicator was applied.\u0000Additionally, building and sustainability trends were tracked, focusing on the\u0000LCA concept. The major research topics mainly relate to building materials and\u0000energy efficiency. In addition to presenting an innovative approach to\u0000reviewing extensive literature domains, the article also provides insights into\u0000emerging and underdeveloped themes, outlining crucial future research\u0000directions.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Knowledge Graph for Models and Algorithms in Applied Mathematics 建立应用数学模型和算法的知识图谱
Pub Date : 2024-08-19 DOI: arxiv-2408.10003
Björn Schembera, Frank Wübbeling, Hendrik Kleikamp, Burkhard Schmidt, Aurela Shehu, Marco Reidelbach, Christine Biedinger, Jochen Fiedler, Thomas Koprucki, Dorothea Iglezakis, Dominik Göddeke
Mathematical models and algorithms are an essential part of mathematicalresearch data, as they are epistemically grounding numerical data. In order torepresent models and algorithms as well as their relationship semantically tomake this research data FAIR, two previously distinct ontologies were mergedand extended, becoming a living knowledge graph. The link between the twoontologies is established by introducing computational tasks, as they occur inmodeling, corresponding to algorithmic tasks. Moreover, controlled vocabulariesare incorporated and a new class, distinguishing base quantities from specificuse case quantities, was introduced. Also, both models and algorithms can nowbe enriched with metadata. Subject-specific metadata is particularly relevanthere, such as the symmetry of a matrix or the linearity of a mathematicalmodel. This is the only way to express specific workflows with concrete modelsand algorithms, as the feasible solution algorithm can only be determined ifthe mathematical properties of a model are known. We demonstrate this using twoexamples from different application areas of applied mathematics. In addition,we have already integrated over 250 research assets from applied mathematicsinto our knowledge graph.
数学模型和算法是数学研究数据的重要组成部分,因为它们是认识论基础的数字数据。为了从语义上表示模型和算法以及它们之间的关系,使这些研究数据成为 FAIR,我们合并并扩展了两个以前不同的本体,使之成为一个活的知识图谱。这两个本体之间的联系是通过引入计算任务建立起来的,因为它们出现在建模中,与算法任务相对应。此外,还纳入了受控词汇表,并引入了一个新的类别,以区分基础量和特定用例量。现在,模型和算法都可以用元数据来充实。在这里,特定主题的元数据尤为重要,例如矩阵的对称性或数学模型的线性。这是用具体模型和算法表达特定工作流的唯一方法,因为只有知道模型的数学属性,才能确定可行的求解算法。我们将通过应用数学不同应用领域的两个实例来证明这一点。此外,我们已经将 250 多项应用数学研究成果整合到了我们的知识图谱中。
{"title":"Towards a Knowledge Graph for Models and Algorithms in Applied Mathematics","authors":"Björn Schembera, Frank Wübbeling, Hendrik Kleikamp, Burkhard Schmidt, Aurela Shehu, Marco Reidelbach, Christine Biedinger, Jochen Fiedler, Thomas Koprucki, Dorothea Iglezakis, Dominik Göddeke","doi":"arxiv-2408.10003","DOIUrl":"https://doi.org/arxiv-2408.10003","url":null,"abstract":"Mathematical models and algorithms are an essential part of mathematical\u0000research data, as they are epistemically grounding numerical data. In order to\u0000represent models and algorithms as well as their relationship semantically to\u0000make this research data FAIR, two previously distinct ontologies were merged\u0000and extended, becoming a living knowledge graph. The link between the two\u0000ontologies is established by introducing computational tasks, as they occur in\u0000modeling, corresponding to algorithmic tasks. Moreover, controlled vocabularies\u0000are incorporated and a new class, distinguishing base quantities from specific\u0000use case quantities, was introduced. Also, both models and algorithms can now\u0000be enriched with metadata. Subject-specific metadata is particularly relevant\u0000here, such as the symmetry of a matrix or the linearity of a mathematical\u0000model. This is the only way to express specific workflows with concrete models\u0000and algorithms, as the feasible solution algorithm can only be determined if\u0000the mathematical properties of a model are known. We demonstrate this using two\u0000examples from different application areas of applied mathematics. In addition,\u0000we have already integrated over 250 research assets from applied mathematics\u0000into our knowledge graph.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can we measure the impact of a database? 我们能否衡量数据库的影响?
Pub Date : 2024-08-19 DOI: arxiv-2408.09842
Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, He Sun
In disseminating scientific and statistical data, on-line databases havealmost completely replaced traditional paper-based media such as journals andreference works. Given this, can we measure the impact of a database in thesame way that we measure an author's or journal's impact? To do this, we needsomehow to represent a database as a set of publications, and databasestypically allow a large number of possible decompositions into parts, any ofwhich could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies,so that if a database admits some kind of hierarchical interpretation we canuse this as one measure of the importance of a database; moreover, this can becomputed as efficiently as one can compute the normal h-index. This also givesus a decomposition of the database that might be used for other purposes suchas giving credit to the curators or contributors to the database. We illustratethe process by analyzing three widely used databases.
在传播科学和统计数据方面,在线数据库几乎完全取代了期刊和参考文献等传统纸质媒体。既然如此,我们能否像衡量作者或期刊的影响力那样来衡量数据库的影响力呢?要做到这一点,我们需要以某种方式将数据库表示为一组出版物,而数据库通常允许大量可能的分解,其中任何一部分都可以被视为出版物。我们证明了 h 指数的定义可以自然地扩展到层次结构,因此,如果数据库允许某种层次结构的解释,我们就可以用它来衡量数据库的重要性;此外,它的计算效率与计算普通的 h 指数一样高。这也为我们提供了数据库的分解方法,可用于其他目的,例如为数据库的策划者或贡献者提供荣誉。我们通过分析三个广泛使用的数据库来说明这一过程。
{"title":"Can we measure the impact of a database?","authors":"Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, He Sun","doi":"arxiv-2408.09842","DOIUrl":"https://doi.org/arxiv-2408.09842","url":null,"abstract":"In disseminating scientific and statistical data, on-line databases have\u0000almost completely replaced traditional paper-based media such as journals and\u0000reference works. Given this, can we measure the impact of a database in the\u0000same way that we measure an author's or journal's impact? To do this, we need\u0000somehow to represent a database as a set of publications, and databases\u0000typically allow a large number of possible decompositions into parts, any of\u0000which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies,\u0000so that if a database admits some kind of hierarchical interpretation we can\u0000use this as one measure of the importance of a database; moreover, this can be\u0000computed as efficiently as one can compute the normal h-index. This also gives\u0000us a decomposition of the database that might be used for other purposes such\u0000as giving credit to the curators or contributors to the database. We illustrate\u0000the process by analyzing three widely used databases.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creating Publishing Accounts for University Professors on Global Scientific Websites (ORCID, Research Gate, Google Scholar) 在全球科学网站(ORCID、Research Gate、Google Scholar)上为大学教授创建出版账户
Pub Date : 2024-08-16 DOI: arxiv-2408.08936
Ahmed Shaker Alalaq
Perhaps among the most prominent sites on which we always encourageprofessors to create accounts are (ORCID), (Reserach Gate), and (GoogleScholar), and how to publish and promote their research through social media orthrough educational platforms, conferences, and scientific workshops Then wetry to explain, in the course of the research, in a smooth manner, the ways toactivate accounts on these platforms, supported by pictures and a comprehensivestep-by-step explanation, as a gesture to encourage the spread of the cultureof electronic publishing in light of the escalation of the digital andcomputing revolution and the desire to catch up with its accelerating pace
我们一直鼓励教授们创建账户的最著名的网站可能有(ORCID)、(Reserach Gate)和(GoogleScholar),以及如何通过社交媒体或教育平台、会议和科学研讨会来发表和推广他们的研究成果、在研究过程中,以流畅的方式,通过图片和全面的分步解释,说明在这些平台上 激活账户的方法,以此作为一种姿态,鼓励传播电子出版文化,因为数字和计算革 命正在升级,人们希望跟上其加速发展的步伐
{"title":"Creating Publishing Accounts for University Professors on Global Scientific Websites (ORCID, Research Gate, Google Scholar)","authors":"Ahmed Shaker Alalaq","doi":"arxiv-2408.08936","DOIUrl":"https://doi.org/arxiv-2408.08936","url":null,"abstract":"Perhaps among the most prominent sites on which we always encourage\u0000professors to create accounts are (ORCID), (Reserach Gate), and (Google\u0000Scholar), and how to publish and promote their research through social media or\u0000through educational platforms, conferences, and scientific workshops Then we\u0000try to explain, in the course of the research, in a smooth manner, the ways to\u0000activate accounts on these platforms, supported by pictures and a comprehensive\u0000step-by-step explanation, as a gesture to encourage the spread of the culture\u0000of electronic publishing in light of the escalation of the digital and\u0000computing revolution and the desire to catch up with its accelerating pace","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analysis of the Impact of Gold Open Access Publications in Computer Science 计算机科学领域开放获取金牌出版物的影响分析
Pub Date : 2024-08-15 DOI: arxiv-2408.10262
Padraig Cunningham, Barry Smyth
There has been some concern about the impact of predatory publishers onscientific research for some time. Recently, publishers that might previouslyhave been considered `predatory' have established their bona fides, at least tothe extent that they are included in citation impact scores such as thefield-weighted citation impact (FWCI). These are sometimes called `grey'publishers (MDPI, Frontiers, Hindawi). In this paper, we show that the citationlandscape for these grey publications is significantly different from themainstream landscape and that affording publications in these venues the samestatus as publications in mainstream journals may significantly distort metricssuch as the FWCI.
一段时间以来,人们一直在关注掠夺性出版商对科学研究的影响。最近,以前可能被认为是 "掠夺性 "的出版商已经建立了自己的诚信,至少在一定程度上,它们被纳入了诸如场加权引文影响(FWCI)等引文影响评分中。这些出版商有时被称为 "灰色 "出版商(MDPI、Frontiers、Hindawi)。在本文中,我们表明这些灰色出版物的引文格局与主流格局大相径庭,给予这些刊物与主流期刊刊物同等的地位可能会严重扭曲FWCI等指标。
{"title":"An Analysis of the Impact of Gold Open Access Publications in Computer Science","authors":"Padraig Cunningham, Barry Smyth","doi":"arxiv-2408.10262","DOIUrl":"https://doi.org/arxiv-2408.10262","url":null,"abstract":"There has been some concern about the impact of predatory publishers on\u0000scientific research for some time. Recently, publishers that might previously\u0000have been considered `predatory' have established their bona fides, at least to\u0000the extent that they are included in citation impact scores such as the\u0000field-weighted citation impact (FWCI). These are sometimes called `grey'\u0000publishers (MDPI, Frontiers, Hindawi). In this paper, we show that the citation\u0000landscape for these grey publications is significantly different from the\u0000mainstream landscape and that affording publications in these venues the same\u0000status as publications in mainstream journals may significantly distort metrics\u0000such as the FWCI.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval 神经纸莎草纸:用于手写纸莎草纸检索的深度注意力嵌入网络
Pub Date : 2024-08-14 DOI: arxiv-2408.07785
Giuseppe De Gregorio, Simon Perrin, Rodrigo C. G. Pena, Isabelle Marthot-Santaniello, Harold Mouchère
The intersection of computer vision and machine learning has emerged as apromising avenue for advancing historical research, facilitating a moreprofound exploration of our past. However, the application of machine learningapproaches in historical palaeography is often met with criticism due to theirperceived ``black box'' nature. In response to this challenge, we introduceNeuroPapyri, an innovative deep learning-based model specifically designed forthe analysis of images containing ancient Greek papyri. To address concernsrelated to transparency and interpretability, the model incorporates anattention mechanism. This attention mechanism not only enhances the model'sperformance but also provides a visual representation of the image regions thatsignificantly contribute to the decision-making process. Specificallycalibrated for processing images of papyrus documents with lines of handwrittentext, the model utilizes individual attention maps to inform the presence orabsence of specific characters in the input image. This paper presents theNeuroPapyri model, including its architecture and training methodology. Resultsfrom the evaluation demonstrate NeuroPapyri's efficacy in document retrieval,showcasing its potential to advance the analysis of historical manuscripts.
计算机视觉与机器学习的交汇已成为推动历史研究的一个重要途径,有助于对我们的过去进行更深入的探索。然而,由于其 "黑箱 "性质,机器学习方法在历史古文字学中的应用常常受到批评。为了应对这一挑战,我们推出了神经纸莎草纸,这是一种基于深度学习的创新模型,专门用于分析包含古希腊纸莎草纸的图像。为了解决与透明度和可解释性相关的问题,该模型采用了注意力机制。这种注意力机制不仅增强了模型的性能,还为对决策过程有重要贡献的图像区域提供了可视化表示。该模型专门针对处理带有手写文字行的纸莎草纸文档图像进行了校准,利用单个注意力图来告知输入图像中特定字符的存在与否。本文介绍了 NeuroPapyri 模型,包括其架构和训练方法。评估结果证明了 NeuroPapyri 在文档检索方面的功效,并展示了其在推进历史手稿分析方面的潜力。
{"title":"NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval","authors":"Giuseppe De Gregorio, Simon Perrin, Rodrigo C. G. Pena, Isabelle Marthot-Santaniello, Harold Mouchère","doi":"arxiv-2408.07785","DOIUrl":"https://doi.org/arxiv-2408.07785","url":null,"abstract":"The intersection of computer vision and machine learning has emerged as a\u0000promising avenue for advancing historical research, facilitating a more\u0000profound exploration of our past. However, the application of machine learning\u0000approaches in historical palaeography is often met with criticism due to their\u0000perceived ``black box'' nature. In response to this challenge, we introduce\u0000NeuroPapyri, an innovative deep learning-based model specifically designed for\u0000the analysis of images containing ancient Greek papyri. To address concerns\u0000related to transparency and interpretability, the model incorporates an\u0000attention mechanism. This attention mechanism not only enhances the model's\u0000performance but also provides a visual representation of the image regions that\u0000significantly contribute to the decision-making process. Specifically\u0000calibrated for processing images of papyrus documents with lines of handwritten\u0000text, the model utilizes individual attention maps to inform the presence or\u0000absence of specific characters in the input image. This paper presents the\u0000NeuroPapyri model, including its architecture and training methodology. Results\u0000from the evaluation demonstrate NeuroPapyri's efficacy in document retrieval,\u0000showcasing its potential to advance the analysis of historical manuscripts.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optical Music Recognition in Manuscripts from the Ricordi Archive 里科迪档案馆手稿中的光学音乐识别技术
Pub Date : 2024-08-14 DOI: arxiv-2408.10260
Federico Simonetta, Rishav Mondal, Luca Andrea Ludovico, Stavros Ntalampiras
The Ricordi archive, a prestigious collection of significant musicalmanuscripts from renowned opera composers such as Donizetti, Verdi and Puccini,has been digitized. This process has allowed us to automatically extractsamples that represent various musical elements depicted on the manuscripts,including notes, staves, clefs, erasures, and composer's annotations, amongothers. To distinguish between digitization noise and actual music elements, asubset of these images was meticulously grouped and labeled by multipleindividuals into several classes. After assessing the consistency of theannotations, we trained multiple neural network-based classifiers todifferentiate between the identified music elements. The primary objective ofthis study was to evaluate the reliability of these classifiers, with theultimate goal of using them for the automatic categorization of the remainingunannotated data set. The dataset, complemented by manual annotations, models,and source code used in these experiments are publicly accessible forreplication purposes.
里科尔迪档案馆是著名歌剧作曲家(如多尼采蒂、威尔第和普契尼)的重要音乐手稿的数字化收藏馆。在这一过程中,我们自动提取了代表手稿上各种音乐元素的样本,包括音符、谱表、谱号、擦除和作曲家注释等。为了区分数字化噪音和实际音乐元素,我们对这些图像的子集进行了细致的分组,并由多人将其标记为多个类别。在评估了标注的一致性后,我们训练了多个基于神经网络的分类器来区分所识别的音乐元素。这项研究的主要目的是评估这些分类器的可靠性,最终目标是使用它们对剩余的未注释数据集进行自动分类。实验中使用的数据集、人工注释、模型和源代码均可公开获取,以便复制。
{"title":"Optical Music Recognition in Manuscripts from the Ricordi Archive","authors":"Federico Simonetta, Rishav Mondal, Luca Andrea Ludovico, Stavros Ntalampiras","doi":"arxiv-2408.10260","DOIUrl":"https://doi.org/arxiv-2408.10260","url":null,"abstract":"The Ricordi archive, a prestigious collection of significant musical\u0000manuscripts from renowned opera composers such as Donizetti, Verdi and Puccini,\u0000has been digitized. This process has allowed us to automatically extract\u0000samples that represent various musical elements depicted on the manuscripts,\u0000including notes, staves, clefs, erasures, and composer's annotations, among\u0000others. To distinguish between digitization noise and actual music elements, a\u0000subset of these images was meticulously grouped and labeled by multiple\u0000individuals into several classes. After assessing the consistency of the\u0000annotations, we trained multiple neural network-based classifiers to\u0000differentiate between the identified music elements. The primary objective of\u0000this study was to evaluate the reliability of these classifiers, with the\u0000ultimate goal of using them for the automatic categorization of the remaining\u0000unannotated data set. The dataset, complemented by manual annotations, models,\u0000and source code used in these experiments are publicly accessible for\u0000replication purposes.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri 希腊纸质古籍计算年代误差分析新框架
Pub Date : 2024-08-14 DOI: arxiv-2408.07779
Giuseppe De Gregorio, Lavinia Ferretti, Rodrigo C. G. Pena, Isabelle Marthot-Santaniello, Maria Konstantinidou, John Pavlopoulos
The study of Greek papyri from ancient Egypt is fundamental for understandingGraeco-Roman Antiquity, offering insights into various aspects of ancientculture and textual production. Palaeography, traditionally used for datingthese manuscripts, relies on identifying chronologically relevant features inhandwriting styles yet lacks a unified methodology, resulting in subjectiveinterpretations and inconsistencies among experts. Recent advances in digitalpalaeography, which leverage artificial intelligence (AI) algorithms, haveintroduced new avenues for dating ancient documents. This paper presents acomparative analysis between an AI-based computational dating model and humanexpert palaeographers, using a novel dataset named Hell-Date comprisingsecurely fine-grained dated Greek papyri from the Hellenistic period. Themethodology involves training a convolutional neural network on visual inputsfrom Hell-Date to predict precise dates of papyri. In addition, experts providepalaeographic dating for comparison. To compare, we developed a new frameworkfor error analysis that reflects the inherent imprecision of the palaeographicdating method. The results indicate that the computational model achievesperformance comparable to that of human experts. These elements will helpassess on a more solid basis future developments of computational algorithms todate Greek papyri.
研究古埃及的希腊纸莎草纸是了解古希腊罗马古代的基础,可以深入了解古代文化和文字制作的各个方面。传统上用于确定这些手稿年代的古文字学依赖于识别手写风格中与年代相关的特征,但缺乏统一的方法,导致专家之间的主观解释和不一致。数字考古学的最新进展利用人工智能(AI)算法,为确定古代文献的年代提供了新的途径。本文利用一个名为 Hell-Date 的新数据集,对基于人工智能的计算年代模型和人类古文字学专家进行了比较分析,该数据集由希腊化时期的希腊纸莎草纸组成,具有可靠的细粒度年代。该方法包括对来自 Hell-Date 的视觉输入进行卷积神经网络训练,以预测纸莎草纸的精确日期。此外,专家们还提供了古文字学的年代,以供比较。为了进行比较,我们开发了一个新的误差分析框架,以反映古文字学定年方法固有的不精确性。结果表明,计算模型的性能可与人类专家相媲美。这些要素将有助于在更坚实的基础上评估未来希腊纸莎草纸年代计算算法的发展。
{"title":"A New Framework for Error Analysis in Computational Paleographic Dating of Greek Papyri","authors":"Giuseppe De Gregorio, Lavinia Ferretti, Rodrigo C. G. Pena, Isabelle Marthot-Santaniello, Maria Konstantinidou, John Pavlopoulos","doi":"arxiv-2408.07779","DOIUrl":"https://doi.org/arxiv-2408.07779","url":null,"abstract":"The study of Greek papyri from ancient Egypt is fundamental for understanding\u0000Graeco-Roman Antiquity, offering insights into various aspects of ancient\u0000culture and textual production. Palaeography, traditionally used for dating\u0000these manuscripts, relies on identifying chronologically relevant features in\u0000handwriting styles yet lacks a unified methodology, resulting in subjective\u0000interpretations and inconsistencies among experts. Recent advances in digital\u0000palaeography, which leverage artificial intelligence (AI) algorithms, have\u0000introduced new avenues for dating ancient documents. This paper presents a\u0000comparative analysis between an AI-based computational dating model and human\u0000expert palaeographers, using a novel dataset named Hell-Date comprising\u0000securely fine-grained dated Greek papyri from the Hellenistic period. The\u0000methodology involves training a convolutional neural network on visual inputs\u0000from Hell-Date to predict precise dates of papyri. In addition, experts provide\u0000palaeographic dating for comparison. To compare, we developed a new framework\u0000for error analysis that reflects the inherent imprecision of the palaeographic\u0000dating method. The results indicate that the computational model achieves\u0000performance comparable to that of human experts. These elements will help\u0000assess on a more solid basis future developments of computational algorithms to\u0000date Greek papyri.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT's Effectiveness with Different Settings and Inputs 评估大型语言模型的研究质量:不同设置和输入下 ChatGPT 的有效性分析
Pub Date : 2024-08-13 DOI: arxiv-2408.06752
Mike Thelwall
Evaluating the quality of academic journal articles is a time consuming butcritical task for national research evaluation exercises, appointments andpromotion. It is therefore important to investigate whether Large LanguageModels (LLMs) can play a role in this process. This article assesses whichChatGPT inputs (full text without tables, figures and references; title andabstract; title only) produce better quality score estimates, and the extent towhich scores are affected by ChatGPT models and system prompts. The resultsshow that the optimal input is the article title and abstract, with averageChatGPT scores based on these (30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores, the highest ever reported. ChatGPT 4o isslightly better than 3.5-turbo (0.66), and 4o-mini (0.66). The results suggestthat article full texts might confuse LLM research quality evaluations, eventhough complex system instructions for the task are more effective than simpleones. Thus, whilst abstracts contain insufficient information for a thoroughassessment of rigour, they may contain strong pointers about originality andsignificance. Finally, linear regression can be used to convert the modelscores into the human scale scores, which is 31% more accurate than guessing.
评估学术期刊文章的质量是一项耗时但对国家研究评估工作、任命和晋升至关重要的任务。因此,研究大型语言模型(LLM)能否在这一过程中发挥作用非常重要。本文评估了哪些 ChatGPT 输入(不含表格、数字和参考文献的全文;标题和摘要;仅标题)能产生更好的质量分数估计值,以及分数受 ChatGPT 模型和系统提示影响的程度。结果表明,最佳输入是文章标题和摘要,基于标题和摘要的平均 ChatGPT 分数(在 51 篇论文的数据集上迭代 30 次)与人类分数的相关性为 0.67,是有报道以来最高的。ChatGPT 4o略优于3.5-turbo(0.66)和4o-mini(0.66)。结果表明,尽管复杂的任务系统说明比简单的说明更有效,但文章全文可能会混淆 LLM 研究质量评价。因此,虽然摘要中包含的信息不足以对严谨性进行全面评估,但它们可能包含有关原创性和重要性的有力提示。最后,线性回归可用于将模型分数转换为人类量表分数,其准确性比猜测高出 31%。
{"title":"Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT's Effectiveness with Different Settings and Inputs","authors":"Mike Thelwall","doi":"arxiv-2408.06752","DOIUrl":"https://doi.org/arxiv-2408.06752","url":null,"abstract":"Evaluating the quality of academic journal articles is a time consuming but\u0000critical task for national research evaluation exercises, appointments and\u0000promotion. It is therefore important to investigate whether Large Language\u0000Models (LLMs) can play a role in this process. This article assesses which\u0000ChatGPT inputs (full text without tables, figures and references; title and\u0000abstract; title only) produce better quality score estimates, and the extent to\u0000which scores are affected by ChatGPT models and system prompts. The results\u0000show that the optimal input is the article title and abstract, with average\u0000ChatGPT scores based on these (30 iterations on a dataset of 51 papers)\u0000correlating at 0.67 with human scores, the highest ever reported. ChatGPT 4o is\u0000slightly better than 3.5-turbo (0.66), and 4o-mini (0.66). The results suggest\u0000that article full texts might confuse LLM research quality evaluations, even\u0000though complex system instructions for the task are more effective than simple\u0000ones. Thus, whilst abstracts contain insufficient information for a thorough\u0000assessment of rigour, they may contain strong pointers about originality and\u0000significance. Finally, linear regression can be used to convert the model\u0000scores into the human scale scores, which is 31% more accurate than guessing.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1