{"title":"Classification of Multilingual Medical Documents using Deep Learning","authors":"W. Karaa, Dridi Kawther","doi":"10.1109/SERA57763.2023.10197749","DOIUrl":null,"url":null,"abstract":"Due to a large number of documents available on the web, operations such as finding a set of information contained in a document has become a difficult task, especially with multilingual documents. Hence the necessity to have performance tools for finding, organizing and classifying information. A variety of classification methods are proposed to resolve this kind of problem but these techniques suffer from limits such as the loss of information, and the loss of relations between words that affects the effectiveness and the performance of the classification process. So, this paper attempts to support the idea of multilingual document classification, especially in the biomedical domain using a new approach, based on deep learning. The key idea is to generate a new conceptual representation of textual multilingual medical documents to facilitate the classification task. In this context, a deep learning technique will be exploited for a good representation. To show the feasibility of our approach, we implemented a system related to a domain that attracts more and more attention from the data mining community: the biomedical domain. An experimental study is performed, using documents extracted from the biomedical benchmark corpus, called Oshumed, which contains documents distributed by different categories.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to a large number of documents available on the web, operations such as finding a set of information contained in a document has become a difficult task, especially with multilingual documents. Hence the necessity to have performance tools for finding, organizing and classifying information. A variety of classification methods are proposed to resolve this kind of problem but these techniques suffer from limits such as the loss of information, and the loss of relations between words that affects the effectiveness and the performance of the classification process. So, this paper attempts to support the idea of multilingual document classification, especially in the biomedical domain using a new approach, based on deep learning. The key idea is to generate a new conceptual representation of textual multilingual medical documents to facilitate the classification task. In this context, a deep learning technique will be exploited for a good representation. To show the feasibility of our approach, we implemented a system related to a domain that attracts more and more attention from the data mining community: the biomedical domain. An experimental study is performed, using documents extracted from the biomedical benchmark corpus, called Oshumed, which contains documents distributed by different categories.