Franz Matthies, Christoph Beger, Ralph Schäfermeier, Alexandr Uciteli
{"title":"Concept Graphs: A Novel Approach for Textual Analysis of Medical Documents.","authors":"Franz Matthies, Christoph Beger, Ralph Schäfermeier, Alexandr Uciteli","doi":"10.3233/SHTI230710","DOIUrl":null,"url":null,"abstract":"<p><p>The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.</p>","PeriodicalId":39242,"journal":{"name":"Studies in Health Technology and Informatics","volume":"307 ","pages":"172-179"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Health Technology and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI230710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 0
Abstract
The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.
期刊介绍:
This book series was started in 1990 to promote research conducted under the auspices of the EC programmes’ Advanced Informatics in Medicine (AIM) and Biomedical and Health Research (BHR) bioengineering branch. A driving aspect of international health informatics is that telecommunication technology, rehabilitative technology, intelligent home technology and many other components are moving together and form one integrated world of information and communication media.