首页 > 最新文献

arXiv - CS - Digital Libraries最新文献

英文 中文
'Intelligence Studies Network': A human-curated database for indexing resources with open-source tools 情报研究网络":利用开源工具编制资源索引的人工编辑数据库
Pub Date : 2024-08-07 DOI: arxiv-2408.03868
Yusuf A. Ozkan
The Intelligence Studies Network is a comprehensive resource database forpublications, events, conferences, and calls for papers in the field ofintelligence studies. It offers a novel solution for monitoring, indexing, andvisualising resources. Sources are automatically monitored and added to amanually curated database, ensuring the relevance of items to intelligencestudies. Curated outputs are stored in a group library on Zotero, anopen-source reference management tool. The metadata of items in Zotero isenriched with OpenAlex, an open access bibliographic database. Finally, outputsare listed and visualised on a Streamlit app, an open-source Python frameworkfor building apps. This paper aims to explain the Intelligence Studies Networkdatabase and provide a detailed guide on data sources and the workflow. Thisstudy demonstrates that it is possible to create a specialised academicdatabase by using open source tools.
情报研究网络是情报研究领域出版物、活动、会议和论文征集的综合资源数据库。它为资源的监控、索引和可视化提供了一个新颖的解决方案。资源会被自动监控,并添加到每年编辑一次的数据库中,以确保项目与情报研究的相关性。策划后的成果存储在开源参考文献管理工具 Zotero 上的群组库中。Zotero项目的元数据通过开放式书目数据库OpenAlex进行丰富。最后,在 Streamlit 应用程序(一种用于构建应用程序的开源 Python 框架)上列出并可视化输出结果。本文旨在解释情报研究网络数据库,并提供有关数据源和工作流程的详细指南。这项研究表明,使用开源工具创建一个专门的学术数据库是可能的。
{"title":"'Intelligence Studies Network': A human-curated database for indexing resources with open-source tools","authors":"Yusuf A. Ozkan","doi":"arxiv-2408.03868","DOIUrl":"https://doi.org/arxiv-2408.03868","url":null,"abstract":"The Intelligence Studies Network is a comprehensive resource database for\u0000publications, events, conferences, and calls for papers in the field of\u0000intelligence studies. It offers a novel solution for monitoring, indexing, and\u0000visualising resources. Sources are automatically monitored and added to a\u0000manually curated database, ensuring the relevance of items to intelligence\u0000studies. Curated outputs are stored in a group library on Zotero, an\u0000open-source reference management tool. The metadata of items in Zotero is\u0000enriched with OpenAlex, an open access bibliographic database. Finally, outputs\u0000are listed and visualised on a Streamlit app, an open-source Python framework\u0000for building apps. This paper aims to explain the Intelligence Studies Network\u0000database and provide a detailed guide on data sources and the workflow. This\u0000study demonstrates that it is possible to create a specialised academic\u0000database by using open source tools.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simplifying Scholarly Abstracts for Accessible Digital Libraries 简化学术论文摘要,打造无障碍数字图书馆
Pub Date : 2024-08-07 DOI: arxiv-2408.03899
Haining Wang, Jason Clark
Standing at the forefront of knowledge dissemination, digital librariescurate vast collections of scientific literature. However, these scholarlywritings are often laden with jargon and tailored for domain experts ratherthan the general public. As librarians, we strive to offer services to adiverse audience, including those with lower reading levels. To extend ourservices beyond mere access, we propose fine-tuning a language model to rewritescholarly abstracts into more comprehensible versions, thereby making scholarlyliterature more accessible when requested. We began by introducing a corpusspecifically designed for training models to simplify scholarly abstracts. Thiscorpus consists of over three thousand pairs of abstracts and significancestatements from diverse disciplines. We then fine-tuned four language modelsusing this corpus. The outputs from the models were subsequently examined bothquantitatively for accessibility and semantic coherence, and qualitatively forlanguage quality, faithfulness, and completeness. Our findings show that theresulting models can improve readability by over three grade levels, whilemaintaining fidelity to the original content. Although commercialstate-of-the-art models still hold an edge, our models are much more compact,can be deployed locally in an affordable manner, and alleviate the privacyconcerns associated with using commercial models. We envision this work as astep toward more inclusive and accessible libraries, improving our services foryoung readers and those without a college degree.
数字图书馆站在知识传播的最前沿,收集了大量科学文献。然而,这些学术著作往往充斥着专业术语,是为领域专家而非普通大众量身定做的。作为图书馆员,我们努力为包括阅读水平较低的读者在内的各类读者提供服务。为了将我们的服务扩展到单纯的查阅之外,我们建议对语言模型进行微调,将学术文摘改写成更易懂的版本,从而使学术文献在被请求时更容易被查阅。我们首先介绍了一个专门用于训练模型以简化学术文摘的语料库。该语料库由来自不同学科的三千多对摘要和意义声明组成。然后,我们利用这个语料库对四个语言模型进行了微调。随后,我们对这些模型的输出结果进行了定量检查,包括可访问性和语义连贯性,以及语言质量、忠实性和完整性。我们的研究结果表明,在保持忠实于原始内容的前提下,所产生的模型可以将可读性提高三个等级以上。尽管最先进的商业模型仍然具有优势,但我们的模型更加紧凑,可以在本地部署,价格合理,而且减轻了与使用商业模型相关的隐私问题。我们将这项工作视为迈向更具包容性和更方便使用的图书馆的一步,改善我们为年轻读者和没有大学文凭的人提供的服务。
{"title":"Simplifying Scholarly Abstracts for Accessible Digital Libraries","authors":"Haining Wang, Jason Clark","doi":"arxiv-2408.03899","DOIUrl":"https://doi.org/arxiv-2408.03899","url":null,"abstract":"Standing at the forefront of knowledge dissemination, digital libraries\u0000curate vast collections of scientific literature. However, these scholarly\u0000writings are often laden with jargon and tailored for domain experts rather\u0000than the general public. As librarians, we strive to offer services to a\u0000diverse audience, including those with lower reading levels. To extend our\u0000services beyond mere access, we propose fine-tuning a language model to rewrite\u0000scholarly abstracts into more comprehensible versions, thereby making scholarly\u0000literature more accessible when requested. We began by introducing a corpus\u0000specifically designed for training models to simplify scholarly abstracts. This\u0000corpus consists of over three thousand pairs of abstracts and significance\u0000statements from diverse disciplines. We then fine-tuned four language models\u0000using this corpus. The outputs from the models were subsequently examined both\u0000quantitatively for accessibility and semantic coherence, and qualitatively for\u0000language quality, faithfulness, and completeness. Our findings show that the\u0000resulting models can improve readability by over three grade levels, while\u0000maintaining fidelity to the original content. Although commercial\u0000state-of-the-art models still hold an edge, our models are much more compact,\u0000can be deployed locally in an affordable manner, and alleviate the privacy\u0000concerns associated with using commercial models. We envision this work as a\u0000step toward more inclusive and accessible libraries, improving our services for\u0000young readers and those without a college degree.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The OpenCitations Index OpenCitations 索引
Pub Date : 2024-08-05 DOI: arxiv-2408.02321
Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti
This article presents the OpenCitations Index, a collection of open citationdata maintained by OpenCitations, an independent, not-for-profit infrastructureorganisation for open scholarship dedicated to publishing open bibliographicand citation data using Semantic Web and Linked Open Data technologies. Thecollection involves citation data harvested from multiple sources. To addressthe possibility of different sources providing citation data for bibliographicentities represented with different identifiers, therefore potentiallyrepresenting same citation, a deduplication mechanism has been implemented.This ensures that citations integrated into OpenCitations Index are accuratelyidentified uniquely, even when different identifiers are used. This mechanismfollows a specific workflow, which encompasses a preprocessing of the originalsource data, a management of the provided bibliographic metadata, and thegeneration of new citation data to be integrated into the OpenCitations Index.The process relies on another data collection: OpenCitations Meta, and on theuse of a new globally persistent identifier, namely OMID (OpenCitations MetaIdentifier). As of July 2024, OpenCitations Index stores over 2 billion uniquecitation links, harvest from Crossref, the National Institute of Heath OpenCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center(JaLC). OpenCitations Index can be systematically accessed and queried throughseveral services, including SPARQL endpoint, REST APIs, and web interfaces.Additionally, dataset dumps are available for free download and reuse (underCC0 waiver) in various formats (CSV, N-Triples, and Scholix), includingprovenance and change tracking information.
本文介绍了OpenCitations索引,这是一个由OpenCitations维护的开放引文数据集。OpenCitations是一个独立的非营利性开放学术基础设施组织,致力于利用语义网(Semantic Web)和关联开放数据(Linked Open Data)技术发布开放书目和引文数据。该文集涉及从多个来源获取的引文数据。为了解决不同来源为使用不同标识符表示的书目实体提供引文数据,从而可能代表相同引文的问题,我们实施了重复数据删除机制。该机制遵循一个特定的工作流程,其中包括对原始源数据的预处理、对所提供书目元数据的管理,以及生成新的引文数据以集成到 OpenCitations 索引中:该过程依赖于另一个数据收集:OpenCitations Meta,以及使用一个新的全球持久标识符,即 OMID(OpenCitations MetaIdentifier)。截至 2024 年 7 月,OpenCitations 索引存储了超过 20 亿条唯一引用链接,这些链接来自 Crossref、美国国立卫生研究院开放引文集(NIH-OCC)、DataCite、OpenAIRE 和日本链接中心(JaLC)。OpenCitations Index 可通过 SPARQL 端点、REST API 和 Web 界面等多种服务进行系统访问和查询。此外,数据集转储可通过各种格式(CSV、N-Triples 和 Scholix)免费下载和重复使用(根据CC0 豁免),包括证明和变更跟踪信息。
{"title":"The OpenCitations Index","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":"https://doi.org/arxiv-2408.02321","url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\u0000data maintained by OpenCitations, an independent, not-for-profit infrastructure\u0000organisation for open scholarship dedicated to publishing open bibliographic\u0000and citation data using Semantic Web and Linked Open Data technologies. The\u0000collection involves citation data harvested from multiple sources. To address\u0000the possibility of different sources providing citation data for bibliographic\u0000entities represented with different identifiers, therefore potentially\u0000representing same citation, a deduplication mechanism has been implemented.\u0000This ensures that citations integrated into OpenCitations Index are accurately\u0000identified uniquely, even when different identifiers are used. This mechanism\u0000follows a specific workflow, which encompasses a preprocessing of the original\u0000source data, a management of the provided bibliographic metadata, and the\u0000generation of new citation data to be integrated into the OpenCitations Index.\u0000The process relies on another data collection: OpenCitations Meta, and on the\u0000use of a new globally persistent identifier, namely OMID (OpenCitations Meta\u0000Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique\u0000citation links, harvest from Crossref, the National Institute of Heath Open\u0000Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\u0000(JaLC). OpenCitations Index can be systematically accessed and queried through\u0000several services, including SPARQL endpoint, REST APIs, and web interfaces.\u0000Additionally, dataset dumps are available for free download and reuse (under\u0000CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\u0000provenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development 文献综述网络:用于系统文献综述、元分析和方法开发的可解释人工智能
Pub Date : 2024-08-05 DOI: arxiv-2408.05239
Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz
Systematic literature reviews are the highest quality of evidence inresearch. However, the review process is hindered by significant resource anddata constraints. The Literature Review Network (LRN) is the first of its kindexplainable AI platform adhering to PRISMA 2020 standards, designed to automatethe entire literature review process. LRN was evaluated in the domain ofsurgical glove practices using 3 search strings developed by experts to queryPubMed. A non-expert trained all LRN models. Performance was benchmarkedagainst an expert manual review. Explainability and performance metricsassessed LRN's ability to replicate the experts' review. Concordance wasmeasured with the Jaccard index and confusion matrices. Researchers wereblinded to the other's results until study completion. Overlapping studies wereintegrated into an LRN-generated systematic review. LRN models demonstratedsuperior classification accuracy without expert training, achieving 84.78% and85.71% accuracy. The highest performance model achieved high interraterreliability (k = 0.4953) and explainability metrics, linking 'reduce','accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51%of the relevant literature despite diverging from the non-expert's judgments (k= 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRNoutperformed the manual review (19,920 minutes over 11 months), reducing theentire process to 288.6 minutes over 5 days. This study demonstrates thatexplainable AI does not require expert training to successfully conductPRISMA-compliant systematic literature reviews like an expert. LRN summarizedthe results of surgical glove studies and identified themes that were nearlyidentical to the clinical researchers' findings. Explainable AI can accuratelyexpedite our understanding of clinical practices, potentially revolutionizinghealthcare research.
系统文献综述是最高质量的研究证据。然而,文献综述过程却受到资源和数据的严重制约。文献综述网络(LRN)是首个符合 PRISMA 2020 标准的可解释人工智能平台,旨在实现整个文献综述过程的自动化。LRN 在外科手套实践领域进行了评估,使用专家开发的 3 个搜索字符串来查询 PubMed。一名非专家训练了所有 LRN 模型。其性能以专家人工审稿为基准。可解释性和性能指标评估了 LRN 复制专家审稿的能力。一致性用 Jaccard 指数和混淆矩阵来衡量。在研究完成之前,研究人员对对方的研究结果一概不知。重叠研究被整合到 LRN 生成的系统综述中。LRN 模型在没有专家培训的情况下表现出更高的分类准确性,准确率分别达到 84.78% 和 85.71%。性能最高的模型实现了较高的相互可靠度(k = 0.4953)和可解释性指标,将 "减少"、"事故 "和 "锐利 "与 "双层手套 "联系起来。另一个 LRN 模型覆盖了 91.51% 的相关文献,尽管与非专家的判断有偏差(k= 0.2174),但它将 "乳胶"、"双层"(手套)和 "指示 "等术语联系在一起。LRNout 进行了人工检索(11 个月 19920 分钟),将整个检索过程缩短为 5 天 288.6 分钟。这项研究表明,可解释的人工智能不需要专家培训就能像专家一样成功进行符合PRISMA标准的系统文献综述。LRN 总结了外科手套研究的结果,并确定了与临床研究人员的发现几乎相同的主题。可解释的人工智能可以准确地扩展我们对临床实践的理解,从而有可能彻底改变医疗保健研究。
{"title":"The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development","authors":"Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz","doi":"arxiv-2408.05239","DOIUrl":"https://doi.org/arxiv-2408.05239","url":null,"abstract":"Systematic literature reviews are the highest quality of evidence in\u0000research. However, the review process is hindered by significant resource and\u0000data constraints. The Literature Review Network (LRN) is the first of its kind\u0000explainable AI platform adhering to PRISMA 2020 standards, designed to automate\u0000the entire literature review process. LRN was evaluated in the domain of\u0000surgical glove practices using 3 search strings developed by experts to query\u0000PubMed. A non-expert trained all LRN models. Performance was benchmarked\u0000against an expert manual review. Explainability and performance metrics\u0000assessed LRN's ability to replicate the experts' review. Concordance was\u0000measured with the Jaccard index and confusion matrices. Researchers were\u0000blinded to the other's results until study completion. Overlapping studies were\u0000integrated into an LRN-generated systematic review. LRN models demonstrated\u0000superior classification accuracy without expert training, achieving 84.78% and\u000085.71% accuracy. The highest performance model achieved high interrater\u0000reliability (k = 0.4953) and explainability metrics, linking 'reduce',\u0000'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51%\u0000of the relevant literature despite diverging from the non-expert's judgments (k\u0000= 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN\u0000outperformed the manual review (19,920 minutes over 11 months), reducing the\u0000entire process to 288.6 minutes over 5 days. This study demonstrates that\u0000explainable AI does not require expert training to successfully conduct\u0000PRISMA-compliant systematic literature reviews like an expert. LRN summarized\u0000the results of surgical glove studies and identified themes that were nearly\u0000identical to the clinical researchers' findings. Explainable AI can accurately\u0000expedite our understanding of clinical practices, potentially revolutionizing\u0000healthcare research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Unique Citing Documents Journal Impact Factor (Uniq-JIF) as a Supplement for the standard Journal Impact Factor 作为标准期刊影响因子补充的唯一引用文献期刊影响因子(Uniq-JIF)
Pub Date : 2024-08-05 DOI: arxiv-2408.08884
Zhesi Shen, Li Li, Yu Liao
This paper introduces the Unique Citing Documents Journal ImpactFactor(Uniq-JIF) as a supplement to the traditional Journal Impact Factor(JIF).The Uniq-JIF counts each citing document only once, aiming to reduce theeffects of citation manipulations. Analysis of 2023 Journal Citation Reportsdata shows that for most journals, the Uniq-JIF is less than 20% lower than theJIF, though some journals show a drop of over 75%. The Uniq-JIF also highlightssignificant reductions for journals suppressed due to citation issues,indicating its effectiveness in identifying problematic journals. The Uniq-JIFoffers a more nuanced view of a journal's influence and can help revealjournals needing further scrutiny.
本文介绍了唯一引用文献期刊影响因子(Uniq-JIF),作为传统期刊影响因子(JIF)的补充。Uniq-JIF对每篇引用文献只计算一次,旨在减少引文操纵的影响。对《2023 年期刊引文报告》数据的分析表明,对于大多数期刊来说,Uniq-JIF 比 JIF 低不到 20%,但有些期刊的降幅超过 75%。Uniq-JIF 还突出显示了因引文问题而被压制的期刊的显著降幅,这表明它在识别问题期刊方面非常有效。Uniq-JIF 可以更细致地反映期刊的影响力,有助于发现需要进一步审查的期刊。
{"title":"The Unique Citing Documents Journal Impact Factor (Uniq-JIF) as a Supplement for the standard Journal Impact Factor","authors":"Zhesi Shen, Li Li, Yu Liao","doi":"arxiv-2408.08884","DOIUrl":"https://doi.org/arxiv-2408.08884","url":null,"abstract":"This paper introduces the Unique Citing Documents Journal Impact\u0000Factor(Uniq-JIF) as a supplement to the traditional Journal Impact Factor(JIF).\u0000The Uniq-JIF counts each citing document only once, aiming to reduce the\u0000effects of citation manipulations. Analysis of 2023 Journal Citation Reports\u0000data shows that for most journals, the Uniq-JIF is less than 20% lower than the\u0000JIF, though some journals show a drop of over 75%. The Uniq-JIF also highlights\u0000significant reductions for journals suppressed due to citation issues,\u0000indicating its effectiveness in identifying problematic journals. The Uniq-JIF\u0000offers a more nuanced view of a journal's influence and can help reveal\u0000journals needing further scrutiny.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"307 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Artificial Intelligence Disclosure (AID) Framework: An Introduction 人工智能披露(AID)框架:简介
Pub Date : 2024-08-04 DOI: arxiv-2408.01904
Kari D. Weaver
As the use of Generative Artificial Intelligence tools have grown in highereducation and research, there have been increasing calls for transparency andgranularity around the use and attribution of the use of these tools. Thus far,this need has been met via the recommended inclusion of a note, with little tono guidance on what the note itself should include. This has been identified asa problem to the use of AI in academic and research contexts. This articleintroduces The Artificial Intelligence Disclosure (AID) Framework, a standard,comprehensive, and detailed framework meant to inform the development andwriting of GenAI disclosure for education and research.
随着生成式人工智能工具在高等教育和研究领域的使用日益增多,人们越来越强烈地要求这些工具的使用和归属具有透明度和条理性。迄今为止,这种需求一直是通过建议纳入注释来满足的,而对于注释本身应包括哪些内容却几乎没有指导。这被认为是人工智能在学术和研究领域使用的一个问题。本文介绍了人工智能披露(AID)框架,这是一个标准、全面、详细的框架,旨在为教育和研究领域开发和撰写 GenAI 披露提供参考。
{"title":"The Artificial Intelligence Disclosure (AID) Framework: An Introduction","authors":"Kari D. Weaver","doi":"arxiv-2408.01904","DOIUrl":"https://doi.org/arxiv-2408.01904","url":null,"abstract":"As the use of Generative Artificial Intelligence tools have grown in higher\u0000education and research, there have been increasing calls for transparency and\u0000granularity around the use and attribution of the use of these tools. Thus far,\u0000this need has been met via the recommended inclusion of a note, with little to\u0000no guidance on what the note itself should include. This has been identified as\u0000a problem to the use of AI in academic and research contexts. This article\u0000introduces The Artificial Intelligence Disclosure (AID) Framework, a standard,\u0000comprehensive, and detailed framework meant to inform the development and\u0000writing of GenAI disclosure for education and research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hotspots and Trends in Magnetoencephalography Research (2013-2022): A Bibliometric Analysis 脑磁图研究的热点和趋势(2013-2022 年):文献计量分析
Pub Date : 2024-08-02 DOI: arxiv-2408.08877
Shen Liu, Jingwen Zhao
This study aimed to utilize bibliometric methods to analyze trends ininternational Magnetoencephalography (MEG) research from 2013 to 2022. Due tothe limited volume of domestic literature on MEG, this analysis focuses solelyon the global research landscape, providing insights from the past decade as arepresentative sample. This study utilized bibliometric methods to explore andanalyze the progress, hotspots and developmental trends in international MEGresearch spanning from 1995 to 2022. The results indicated a dynamic and steadygrowth trend in the overall number of publications in MEG. Ryusuke Kakigiemerged as the most prolific author, while Neuroimage led as the most prolificjournal. Current hotspots in MEG research encompass resting state, networks,functional connectivity, phase dynamics, oscillation, and more. Future trendsin MEG research are poised to advance across three key aspects: diseasetreatment and practical applications, experimental foundations and technicaladvancements, and fundamental and advanced human cognition. In the future,there should be a focus on enhancing cross-integration and utilization of MEGwith other instruments to diversify research methodologies in this field
本研究旨在利用文献计量学方法分析 2013 至 2022 年国际脑磁图(MEG)研究的趋势。由于国内有关 MEG 的文献数量有限,本分析仅关注全球研究格局,提供过去十年的研究见解作为代表性样本。本研究采用文献计量学方法,探讨和分析了 1995 至 2022 年间国际 MEG 研究的进展、热点和发展趋势。研究结果表明,MEG 的论文总数呈动态稳定增长趋势。柿木龙介(Ryusuke Kakigie)成为发表论文最多的作者,而《神经影像》(Neuroimage)则成为发表论文最多的期刊。目前 MEG 研究的热点包括静息状态、网络、功能连接、相位动力学、振荡等。MEG 研究的未来趋势将在三个关键方面取得进展:疾病治疗和实际应用、实验基础和技术进步,以及基础和高级人类认知。未来,应重点加强 MEG 与其他仪器的交叉整合和利用,以丰富该领域的研究方法。
{"title":"Hotspots and Trends in Magnetoencephalography Research (2013-2022): A Bibliometric Analysis","authors":"Shen Liu, Jingwen Zhao","doi":"arxiv-2408.08877","DOIUrl":"https://doi.org/arxiv-2408.08877","url":null,"abstract":"This study aimed to utilize bibliometric methods to analyze trends in\u0000international Magnetoencephalography (MEG) research from 2013 to 2022. Due to\u0000the limited volume of domestic literature on MEG, this analysis focuses solely\u0000on the global research landscape, providing insights from the past decade as a\u0000representative sample. This study utilized bibliometric methods to explore and\u0000analyze the progress, hotspots and developmental trends in international MEG\u0000research spanning from 1995 to 2022. The results indicated a dynamic and steady\u0000growth trend in the overall number of publications in MEG. Ryusuke Kakigi\u0000emerged as the most prolific author, while Neuroimage led as the most prolific\u0000journal. Current hotspots in MEG research encompass resting state, networks,\u0000functional connectivity, phase dynamics, oscillation, and more. Future trends\u0000in MEG research are poised to advance across three key aspects: disease\u0000treatment and practical applications, experimental foundations and technical\u0000advancements, and fundamental and advanced human cognition. In the future,\u0000there should be a focus on enhancing cross-integration and utilization of MEG\u0000with other instruments to diversify research methodologies in this field","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harvesting Textual and Structured Data from the HAL Publication Repository 从 HAL 出版物资料库获取文本和结构化数据
Pub Date : 2024-07-30 DOI: arxiv-2407.20595
Francis Kulumba, Wissam Antoun, Guillaume Vimont, Laurent Romary
HAL (Hyper Articles en Ligne) is the French national publication repository,used by most higher education and research organizations for their open sciencepolicy. As a digital library, it is a rich repository of scholarly documents,but its potential for advanced research has been underutilized. We presentHALvest, a unique dataset that bridges the gap between citation networks andthe full text of papers submitted on HAL. We craft our dataset by filtering HALfor scholarly publications, resulting in approximately 700,000 documents,spanning 34 languages across 13 identified domains, suitable for language modeltraining, and yielding approximately 16.5 billion tokens (with 8 billion inFrench and 7 billion in English, the most represented languages). We transformthe metadata of each paper into a citation network, producing a directedheterogeneous graph. This graph includes uniquely identified authors on HAL, aswell as all open submitted papers, and their citations. We provide a baselinefor authorship attribution using the dataset, implement a range ofstate-of-the-art models in graph representation learning for link prediction,and discuss the usefulness of our generated knowledge graph structure.
HAL(Hyper Articles en Ligne)是法国国家出版物库,被大多数高等教育和研究机构用于其开放科学政策。作为一个数字图书馆,它拥有丰富的学术文献资源,但其在高级研究方面的潜力却未得到充分利用。HALvest 是一个独特的数据集,它在引文网络和 HAL 上提交的论文全文之间架起了一座桥梁。我们通过过滤 HAL 上的学术出版物来制作我们的数据集,最终得到了约 70 万篇文档,涵盖 13 个已确定领域的 34 种语言,适合语言模型训练,并产生了约 165 亿个词块(其中法语和英语分别为 80 亿和 70 亿,是代表性最强的语言)。我们将每篇论文的元数据转化为引文网络,生成有向异构图。该图包括 HAL 上唯一标识的作者、所有公开提交的论文及其引文。我们利用该数据集提供了作者归属的基线,实现了一系列用于链接预测的图表示学习的最新模型,并讨论了我们生成的知识图结构的实用性。
{"title":"Harvesting Textual and Structured Data from the HAL Publication Repository","authors":"Francis Kulumba, Wissam Antoun, Guillaume Vimont, Laurent Romary","doi":"arxiv-2407.20595","DOIUrl":"https://doi.org/arxiv-2407.20595","url":null,"abstract":"HAL (Hyper Articles en Ligne) is the French national publication repository,\u0000used by most higher education and research organizations for their open science\u0000policy. As a digital library, it is a rich repository of scholarly documents,\u0000but its potential for advanced research has been underutilized. We present\u0000HALvest, a unique dataset that bridges the gap between citation networks and\u0000the full text of papers submitted on HAL. We craft our dataset by filtering HAL\u0000for scholarly publications, resulting in approximately 700,000 documents,\u0000spanning 34 languages across 13 identified domains, suitable for language model\u0000training, and yielding approximately 16.5 billion tokens (with 8 billion in\u0000French and 7 billion in English, the most represented languages). We transform\u0000the metadata of each paper into a citation network, producing a directed\u0000heterogeneous graph. This graph includes uniquely identified authors on HAL, as\u0000well as all open submitted papers, and their citations. We provide a baseline\u0000for authorship attribution using the dataset, implement a range of\u0000state-of-the-art models in graph representation learning for link prediction,\u0000and discuss the usefulness of our generated knowledge graph structure.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R-Index: A Metric for Assessing Researcher Contributions to Peer Review R-Index:评估研究人员对同行评审贡献的标准
Pub Date : 2024-07-29 DOI: arxiv-2407.19949
Milad Malekzadeh
I propose the R-Index, defined as the difference between the sum of reviewresponsibilities for a researcher's publications and the number of reviews theyhave completed, as a novel metric to effectively characterize a researcher'scontribution to the peer review process. This index aims to balance the demandsplaced on the peer review system by a researcher's publication output withtheir engagement in reviewing others' work, providing a measure of whether theyare giving back to the academic community commensurately with their ownpublication demands. The R-Index offers a straightforward and fair approach toencourage equitable participation in peer review, thereby supporting thesustainability and efficiency of the scholarly publishing process.
我提出了 "R 指数"(R-Index),它的定义是研究人员的出版物审稿责任总和与其完成的审稿数量之差,是有效描述研究人员对同行评议过程的贡献的新指标。该指数旨在平衡研究人员的出版物产出对同行评审系统的要求和他们对他人工作的评审参与度,提供了一个衡量研究人员对学术界的回馈是否与其自身的出版物需求相称的标准。R-Index 提供了一种直接而公平的方法来鼓励公平参与同行评审,从而支持学术出版过程的可持续性和效率。
{"title":"R-Index: A Metric for Assessing Researcher Contributions to Peer Review","authors":"Milad Malekzadeh","doi":"arxiv-2407.19949","DOIUrl":"https://doi.org/arxiv-2407.19949","url":null,"abstract":"I propose the R-Index, defined as the difference between the sum of review\u0000responsibilities for a researcher's publications and the number of reviews they\u0000have completed, as a novel metric to effectively characterize a researcher's\u0000contribution to the peer review process. This index aims to balance the demands\u0000placed on the peer review system by a researcher's publication output with\u0000their engagement in reviewing others' work, providing a measure of whether they\u0000are giving back to the academic community commensurately with their own\u0000publication demands. The R-Index offers a straightforward and fair approach to\u0000encourage equitable participation in peer review, thereby supporting the\u0000sustainability and efficiency of the scholarly publishing process.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting citation impact of research papers using GPT and other text embeddings 利用 GPT 和其他文本嵌入预测研究论文的引文影响力
Pub Date : 2024-07-29 DOI: arxiv-2407.19942
Adilson Vital Jr., Filipi N. Silva, Osvaldo N. Oliveira Jr., Diego R. Amancio
The impact of research papers, typically measured in terms of citationcounts, depends on several factors, including the reputation of the authors,journals, and institutions, in addition to the quality of the scientific work.In this paper, we present an approach that combines natural language processingand machine learning to predict the impact of papers in a specific journal. Ourfocus is on the text, which should correlate with impact and the topics coveredin the research. We employed a dataset of over 40,000 articles from ACS AppliedMaterials and Interfaces spanning from 2012 to 2022. The data was processedusing various text embedding techniques and classified with supervised machinelearning algorithms. Papers were categorized into the top 20% most cited withinthe journal, using both yearly and cumulative citation counts as metrics. Ouranalysis reveals that the method employing generative pre-trained transformers(GPT) was the most efficient for embedding, while the random forest algorithmexhibited the best predictive power among the machine learning algorithms. Anoptimized accuracy of 80% in predicting whether a paper was among the top 20%most cited was achieved for the cumulative citation count when abstracts wereprocessed. This accuracy is noteworthy, considering that author, institution,and early citation pattern information were not taken into account. Theaccuracy increased only slightly when the full texts of the papers wereprocessed. Also significant is the finding that a simpler embedding technique,term frequency-inverse document frequency (TFIDF), yielded performance close tothat of GPT. Since TFIDF captures the topics of the paper we infer that, apartfrom considering author and institution biases, citation counts for theconsidered journal may be predicted by identifying topics and "reading" theabstract of a paper.
研究论文的影响力通常以引用次数来衡量,它取决于多个因素,包括作者、期刊和机构的声誉,以及科研工作的质量。在本文中,我们介绍了一种结合自然语言处理和机器学习的方法,用于预测特定期刊论文的影响力。我们的重点是文本,它应与影响力和研究主题相关联。我们使用了一个数据集,该数据集收录了从 2012 年到 2022 年期间《ACS 应用材料与界面》杂志上的 40,000 多篇文章。我们使用各种文本嵌入技术对数据进行了处理,并使用有监督的机器学习算法对数据进行了分类。使用年度和累计引用次数作为衡量标准,将论文归类为期刊内被引用次数最多的前 20%。我们的分析表明,采用生成式预训练转换器(GPT)的方法是最有效的嵌入方法,而随机森林算法在机器学习算法中表现出最佳的预测能力。对摘要进行处理后,在预测一篇论文是否属于被引用次数最多的前 20% 时,达到了 80% 的最佳准确率。考虑到作者、机构和早期引用模式信息未被考虑在内,这一准确率是值得注意的。在处理论文全文时,准确率仅略有提高。同样重要的是,我们发现一种更简单的嵌入技术--词频-反向文档频率(TFIDF)--的性能接近于 GPT。由于 TFIDF 可以捕捉到论文的主题,因此我们推断,除了考虑作者和机构的偏差之外,还可以通过识别主题和 "阅读 "论文摘要来预测所考虑期刊的引用次数。
{"title":"Predicting citation impact of research papers using GPT and other text embeddings","authors":"Adilson Vital Jr., Filipi N. Silva, Osvaldo N. Oliveira Jr., Diego R. Amancio","doi":"arxiv-2407.19942","DOIUrl":"https://doi.org/arxiv-2407.19942","url":null,"abstract":"The impact of research papers, typically measured in terms of citation\u0000counts, depends on several factors, including the reputation of the authors,\u0000journals, and institutions, in addition to the quality of the scientific work.\u0000In this paper, we present an approach that combines natural language processing\u0000and machine learning to predict the impact of papers in a specific journal. Our\u0000focus is on the text, which should correlate with impact and the topics covered\u0000in the research. We employed a dataset of over 40,000 articles from ACS Applied\u0000Materials and Interfaces spanning from 2012 to 2022. The data was processed\u0000using various text embedding techniques and classified with supervised machine\u0000learning algorithms. Papers were categorized into the top 20% most cited within\u0000the journal, using both yearly and cumulative citation counts as metrics. Our\u0000analysis reveals that the method employing generative pre-trained transformers\u0000(GPT) was the most efficient for embedding, while the random forest algorithm\u0000exhibited the best predictive power among the machine learning algorithms. An\u0000optimized accuracy of 80% in predicting whether a paper was among the top 20%\u0000most cited was achieved for the cumulative citation count when abstracts were\u0000processed. This accuracy is noteworthy, considering that author, institution,\u0000and early citation pattern information were not taken into account. The\u0000accuracy increased only slightly when the full texts of the papers were\u0000processed. Also significant is the finding that a simpler embedding technique,\u0000term frequency-inverse document frequency (TFIDF), yielded performance close to\u0000that of GPT. Since TFIDF captures the topics of the paper we infer that, apart\u0000from considering author and institution biases, citation counts for the\u0000considered journal may be predicted by identifying topics and \"reading\" the\u0000abstract of a paper.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1