Isuri Anuradha Nanomi Arachchige, L. Ha, R. Mitkov, Johannes-Dieter Steinert
{"title":"Enhancing Named Entity Recognition for Holocaust Testimonies through Pseudo Labelling and Transformer-based Models","authors":"Isuri Anuradha Nanomi Arachchige, L. Ha, R. Mitkov, Johannes-Dieter Steinert","doi":"10.1145/3604951.3605514","DOIUrl":null,"url":null,"abstract":"The Holocaust was a tragic and catastrophic event in World War II (WWII) history that resulted in the loss of millions of lives. In recent years, the emergence of the field of digital humanities has made the study of Holocaust testimonies an important area of research for historians, Holocaust educators, social scientists, and linguists. One of the challenges in analysing Holocaust testimonies is the recognition and categorisation of named entities such as concentration camps, military officers, ships, and ghettos, due to the scarcity of annotated data. This paper presents a research study on a domain-specific hybrid named-entity recognition model, which focuses on developing NER models specifically tailored for the Holocaust domain. To overcome the problem of data scarcity, we employed hybrid annotation approach to training different transformer model architectures in order to recognise the named entities. Results show transformer models to have good performance compared to other approaches.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3604951.3605514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Holocaust was a tragic and catastrophic event in World War II (WWII) history that resulted in the loss of millions of lives. In recent years, the emergence of the field of digital humanities has made the study of Holocaust testimonies an important area of research for historians, Holocaust educators, social scientists, and linguists. One of the challenges in analysing Holocaust testimonies is the recognition and categorisation of named entities such as concentration camps, military officers, ships, and ghettos, due to the scarcity of annotated data. This paper presents a research study on a domain-specific hybrid named-entity recognition model, which focuses on developing NER models specifically tailored for the Holocaust domain. To overcome the problem of data scarcity, we employed hybrid annotation approach to training different transformer model architectures in order to recognise the named entities. Results show transformer models to have good performance compared to other approaches.