结构化摘要生成器（SAG）模型：分析文章的 IMRAD 结构及其对提取式摘要的影响

IF 1.7 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE International Journal on Digital Libraries Pub Date : 2024-05-07 DOI:10.1007/s00799-024-00402-8

Ayşe Esra Özkan Çelik, Umut Al

{"title":"结构化摘要生成器（SAG）模型：分析文章的 IMRAD 结构及其对提取式摘要的影响","authors":"Ayşe Esra Özkan Çelik, Umut Al","doi":"10.1007/s00799-024-00402-8","DOIUrl":null,"url":null,"abstract":"<p>An abstract is the most crucial element that may convince readers to read the complete text of a scientific publication. However, studies show that in terms of organization, readability, and style, abstracts are also among the most troublesome parts of the pertinent manuscript. The ultimate goal of this article is to produce better understandable abstracts with automatic methods that will contribute to scientific communication in Turkish. We propose a summarization system based on extractive techniques combining general features that have been shown to be beneficial for Turkish. To construct the data set for this aim, a sample of 421 peer-reviewed Turkish articles in the field of librarianship and information science was developed. First, the structure of the full-texts, and their readability in comparison with author abstracts, were examined for text quality evaluation. A content-based evaluation of the system outputs was then carried out. System outputs, in cases of using and ignoring structural features of full-texts, were compared. Structured outputs outperformed classical outputs in terms of content and text quality. Each output group has better readability levels than their original abstracts. Additionally, it was discovered that higher-quality outputs are correlated with more structured full-texts, highlighting the importance of structural writing. Finally, it was determined that our system can facilitate the scholarly communication process as an auxiliary tool for authors and editors. Findings also indicate the significance of structural writing for better scholarly communication.\n</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"27 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structured abstract generator (SAG) model: analysis of IMRAD structure of articles and its effect on extractive summarization\",\"authors\":\"Ayşe Esra Özkan Çelik, Umut Al\",\"doi\":\"10.1007/s00799-024-00402-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>An abstract is the most crucial element that may convince readers to read the complete text of a scientific publication. However, studies show that in terms of organization, readability, and style, abstracts are also among the most troublesome parts of the pertinent manuscript. The ultimate goal of this article is to produce better understandable abstracts with automatic methods that will contribute to scientific communication in Turkish. We propose a summarization system based on extractive techniques combining general features that have been shown to be beneficial for Turkish. To construct the data set for this aim, a sample of 421 peer-reviewed Turkish articles in the field of librarianship and information science was developed. First, the structure of the full-texts, and their readability in comparison with author abstracts, were examined for text quality evaluation. A content-based evaluation of the system outputs was then carried out. System outputs, in cases of using and ignoring structural features of full-texts, were compared. Structured outputs outperformed classical outputs in terms of content and text quality. Each output group has better readability levels than their original abstracts. Additionally, it was discovered that higher-quality outputs are correlated with more structured full-texts, highlighting the importance of structural writing. Finally, it was determined that our system can facilitate the scholarly communication process as an auxiliary tool for authors and editors. Findings also indicate the significance of structural writing for better scholarly communication.\\n</p>\",\"PeriodicalId\":44974,\"journal\":{\"name\":\"International Journal on Digital Libraries\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00799-024-00402-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00799-024-00402-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

摘要是说服读者阅读科学出版物全文的最关键要素。然而，研究表明，就组织、可读性和风格而言，摘要也是相关稿件中最麻烦的部分之一。本文的最终目标是通过自动方法制作出更好理解的摘要，从而为土耳其语的科学交流做出贡献。我们提出了一种基于提取技术的摘要系统，该系统结合了已被证明对土耳其语有益的一般特征。为了构建实现这一目标的数据集，我们开发了图书馆学和信息科学领域的 421 篇同行评议的土耳其语文章样本。首先，对全文的结构及其与作者摘要的可读性进行了检查，以便对文本质量进行评估。然后对系统输出进行了基于内容的评估。比较了使用和忽略全文结构特征的系统输出结果。就内容和文本质量而言，结构化输出优于传统输出。每个输出组都比其原始摘要具有更好的可读性。此外，我们还发现，质量更高的输出结果与结构更合理的全文相关，这凸显了结构化写作的重要性。最后，我们确定我们的系统可以作为作者和编辑的辅助工具，促进学术交流过程。研究结果还表明，结构化写作对于更好地进行学术交流具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Structured abstract generator (SAG) model: analysis of IMRAD structure of articles and its effect on extractive summarization

An abstract is the most crucial element that may convince readers to read the complete text of a scientific publication. However, studies show that in terms of organization, readability, and style, abstracts are also among the most troublesome parts of the pertinent manuscript. The ultimate goal of this article is to produce better understandable abstracts with automatic methods that will contribute to scientific communication in Turkish. We propose a summarization system based on extractive techniques combining general features that have been shown to be beneficial for Turkish. To construct the data set for this aim, a sample of 421 peer-reviewed Turkish articles in the field of librarianship and information science was developed. First, the structure of the full-texts, and their readability in comparison with author abstracts, were examined for text quality evaluation. A content-based evaluation of the system outputs was then carried out. System outputs, in cases of using and ignoring structural features of full-texts, were compared. Structured outputs outperformed classical outputs in terms of content and text quality. Each output group has better readability levels than their original abstracts. Additionally, it was discovered that higher-quality outputs are correlated with more structured full-texts, highlighting the importance of structural writing. Finally, it was determined that our system can facilitate the scholarly communication process as an auxiliary tool for authors and editors. Findings also indicate the significance of structural writing for better scholarly communication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal on Digital Libraries

CiteScore

4.30

自引率

6.70%

发文量

期刊介绍： The International Journal on Digital Libraries (IJDL) examines the theory and practice of acquisition definition organization management preservation and dissemination of digital information via global networking. It covers all aspects of digital libraries (DLs) from large-scale heterogeneous data and information management & access to linking and connectivity to security privacy and policies to its application use and evaluation.The scope of IJDL includes but is not limited to: The FAIR principle and the digital libraries infrastructure Findable: Information access and retrieval; semantic search; data and information exploration; information navigation; smart indexing and searching; resource discovery Accessible: visualization and digital collections; user interfaces; interfaces for handicapped users; HCI and UX in DLs; Security and privacy in DLs; multimodal access Interoperable: metadata (definition management curation integration); syntactic and semantic interoperability; linked data Reusable: reproducibility; Open Science; sustainability profitability repeatability of research results; confidentiality and privacy issues in DLs Digital Library Architectures including heterogeneous and dynamic data management; data and repositories Acquisition of digital information: authoring environments for digital objects; digitization of traditional content Digital Archiving and Preservation Digital Preservation and curation Digital archiving Web Archiving Archiving and preservation Strategies AI for Digital Libraries Machine Learning for DLs Data Mining in DLs NLP for DLs Applications of Digital Libraries Digital Humanities Open Data and their reuse Scholarly DLs (incl. bibliometrics altmetrics) Epigraphy and Paleography Digital Museums Future trends in Digital Libraries Definition of DLs in a ubiquitous digital library world Datafication of digital collections Interaction and user experience (UX) in DLs Information visualization Collection understanding Privacy and security Multimodal user interfaces Accessibility (or "Access for users with disabilities") UX studies