{"title":"自动绘制土耳其语文本概念图的新方法","authors":"Merve Bayrak, Deniz Dal","doi":"10.1007/s10579-023-09713-9","DOIUrl":null,"url":null,"abstract":"<p>Concept maps are two-dimensional visual tools that describe the relationships between concepts belonging to a particular subject. The manual creation of these maps entails problems such as requiring expertise in the relevant field, minimizing visual complexity, and integrating maps, especially in terms of text-intensive documents. In order to overcome these problems, automatic creation of concept maps is required. On the other hand, the production of a fully automated and human-hand quality concept map from a document has not yet been achieved satisfactorily. Motivated by this observation, this study aims to develop a new methodology for automatic creation of the concept maps from Turkish text documents for the first time in the literature. In this respect, within the scope of this study, a new heuristic algorithm has been developed using the Turkish Natural Language Processing software chain and the Graphviz tool to automatically extract concept maps from Turkish texts. The proposed algorithm works with the principle of obtaining concepts based on the dependencies of Turkish words in sentences. The algorithm also determines the sentences to be added to the concept map with a new sentence scoring mechanism. The developed algorithm has been applied on a total of 20 data sets in the fields of Turkish Literature, Geography, Science, and Computer Sciences. The effectiveness of the algorithm has been analyzed with three different performance evaluation criteria, namely precision, recall and F-score. The findings have revealed that the proposed algorithm is quite effective in Turkish texts containing concepts. It has also been observed that the sentence selection algorithm produces results close to the average value in terms of the performance criteria being evaluated. According to the findings, the concept maps automatically obtained by the proposed algorithm are quite similar to the concept maps extracted manually. On the other hand, there is a limitation of the developed algorithm since it is dependent on a natural language processing tool and therefore requires manual intervention in some cases.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"41 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new methodology for automatic creation of concept maps of Turkish texts\",\"authors\":\"Merve Bayrak, Deniz Dal\",\"doi\":\"10.1007/s10579-023-09713-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Concept maps are two-dimensional visual tools that describe the relationships between concepts belonging to a particular subject. The manual creation of these maps entails problems such as requiring expertise in the relevant field, minimizing visual complexity, and integrating maps, especially in terms of text-intensive documents. In order to overcome these problems, automatic creation of concept maps is required. On the other hand, the production of a fully automated and human-hand quality concept map from a document has not yet been achieved satisfactorily. Motivated by this observation, this study aims to develop a new methodology for automatic creation of the concept maps from Turkish text documents for the first time in the literature. In this respect, within the scope of this study, a new heuristic algorithm has been developed using the Turkish Natural Language Processing software chain and the Graphviz tool to automatically extract concept maps from Turkish texts. The proposed algorithm works with the principle of obtaining concepts based on the dependencies of Turkish words in sentences. The algorithm also determines the sentences to be added to the concept map with a new sentence scoring mechanism. The developed algorithm has been applied on a total of 20 data sets in the fields of Turkish Literature, Geography, Science, and Computer Sciences. The effectiveness of the algorithm has been analyzed with three different performance evaluation criteria, namely precision, recall and F-score. The findings have revealed that the proposed algorithm is quite effective in Turkish texts containing concepts. It has also been observed that the sentence selection algorithm produces results close to the average value in terms of the performance criteria being evaluated. According to the findings, the concept maps automatically obtained by the proposed algorithm are quite similar to the concept maps extracted manually. On the other hand, there is a limitation of the developed algorithm since it is dependent on a natural language processing tool and therefore requires manual intervention in some cases.</p>\",\"PeriodicalId\":49927,\"journal\":{\"name\":\"Language Resources and Evaluation\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Resources and Evaluation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10579-023-09713-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-023-09713-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
概念图是一种二维可视化工具,用于描述属于某一特定主题的概念之间的关系。手工绘制这些地图会遇到一些问题,如需要相关领域的专业知识、尽量减少视觉复杂性以及整合地图,特别是在文本密集型文档方面。为了克服这些问题,需要自动绘制概念图。另一方面,从文件中生成全自动、高质量的人工概念图的工作尚未取得令人满意的成果。受此启发,本研究旨在开发一种从土耳其文本文档自动创建概念图的新方法,这在文献中尚属首次。在这方面,本研究利用土耳其自然语言处理软件链和 Graphviz 工具开发了一种新的启发式算法,用于从土耳其文本中自动提取概念图。拟议算法的工作原理是根据句子中土耳其语单词的依赖关系获取概念。该算法还通过一种新的句子评分机制来确定要添加到概念图中的句子。所开发的算法已应用于土耳其文学、地理、科学和计算机科学领域的共 20 个数据集。通过精确度、召回率和 F 分数这三种不同的性能评估标准,对算法的有效性进行了分析。研究结果表明,所提出的算法在包含概念的土耳其文本中相当有效。此外,还发现句子选择算法在性能评估标准方面产生的结果接近平均值。根据研究结果,拟议算法自动获得的概念图与人工提取的概念图非常相似。另一方面,所开发的算法也存在局限性,因为它依赖于自然语言处理工具,因此在某些情况下需要人工干预。
A new methodology for automatic creation of concept maps of Turkish texts
Concept maps are two-dimensional visual tools that describe the relationships between concepts belonging to a particular subject. The manual creation of these maps entails problems such as requiring expertise in the relevant field, minimizing visual complexity, and integrating maps, especially in terms of text-intensive documents. In order to overcome these problems, automatic creation of concept maps is required. On the other hand, the production of a fully automated and human-hand quality concept map from a document has not yet been achieved satisfactorily. Motivated by this observation, this study aims to develop a new methodology for automatic creation of the concept maps from Turkish text documents for the first time in the literature. In this respect, within the scope of this study, a new heuristic algorithm has been developed using the Turkish Natural Language Processing software chain and the Graphviz tool to automatically extract concept maps from Turkish texts. The proposed algorithm works with the principle of obtaining concepts based on the dependencies of Turkish words in sentences. The algorithm also determines the sentences to be added to the concept map with a new sentence scoring mechanism. The developed algorithm has been applied on a total of 20 data sets in the fields of Turkish Literature, Geography, Science, and Computer Sciences. The effectiveness of the algorithm has been analyzed with three different performance evaluation criteria, namely precision, recall and F-score. The findings have revealed that the proposed algorithm is quite effective in Turkish texts containing concepts. It has also been observed that the sentence selection algorithm produces results close to the average value in terms of the performance criteria being evaluated. According to the findings, the concept maps automatically obtained by the proposed algorithm are quite similar to the concept maps extracted manually. On the other hand, there is a limitation of the developed algorithm since it is dependent on a natural language processing tool and therefore requires manual intervention in some cases.
期刊介绍:
Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications.
Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use.
Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.