Kyoungsu Oh, Min Kang, SeoHyun Oh, Do-hyoung Kim, Seokhwan Kang, Youngho Lee
{"title":"AB-XLNet: Named Entity Recognition Tool for Health Information Technology Standardization","authors":"Kyoungsu Oh, Min Kang, SeoHyun Oh, Do-hyoung Kim, Seokhwan Kang, Youngho Lee","doi":"10.1109/ICTC55196.2022.9952819","DOIUrl":null,"url":null,"abstract":"We conducted a study to identify drug-related information on non-standardized on non-standardized discharge summaries using a pre-trained BERT-based model. After tokenizing the dataset, it was identified with the IOB tagging schema and trained on the training data with the Random Insert technique through the pre-trained BERT. As a result, the F1-score of AB-XLNet was improved by 3% compared to XLNet, and ADE and Form, which could not be extracted from XLNet, were extracted. Future research will focus on presenting a generalized model using large amounts of data from multiple institutions.","PeriodicalId":441404,"journal":{"name":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC55196.2022.9952819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We conducted a study to identify drug-related information on non-standardized on non-standardized discharge summaries using a pre-trained BERT-based model. After tokenizing the dataset, it was identified with the IOB tagging schema and trained on the training data with the Random Insert technique through the pre-trained BERT. As a result, the F1-score of AB-XLNet was improved by 3% compared to XLNet, and ADE and Form, which could not be extracted from XLNet, were extracted. Future research will focus on presenting a generalized model using large amounts of data from multiple institutions.