{"title":"记录链接的方法:一个医学领域的案例研究","authors":"M. Vargas-Vera","doi":"10.4018/IJKSR.2015100102","DOIUrl":null,"url":null,"abstract":"This paper presents a methodology for linking records from several sources each source might contain, missing information. This assumption of missing values has been made, without loss of generality, as the authors has observed that missing information is part of the nature of data in the health domain and also in other domains such as social sciences. The author's methodology is an attempt to deal with the linkage of records of the same patient in several databases. The first phase in her methodology is called homogenization. The homogenization of the databases/datasets is performed by applying a method which fills-in the missing values with the predicted values. The second phase of her methodology is called linking of records. It assesses the similarity between records and implements the linkage of the pairs of records with high level of similarity. Finally, the author presents an evaluation of our methodology. The evaluation of the homogenization phase was carried out using multinomial regression while, the evaluation of the aggregated similarities were performed using Jaccard, Jaro-Winkler and Monge-Elkan similarity metrics.","PeriodicalId":296518,"journal":{"name":"Int. J. Knowl. Soc. Res.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Methodology for Record Linkage: A Medical Domain Case Study\",\"authors\":\"M. Vargas-Vera\",\"doi\":\"10.4018/IJKSR.2015100102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a methodology for linking records from several sources each source might contain, missing information. This assumption of missing values has been made, without loss of generality, as the authors has observed that missing information is part of the nature of data in the health domain and also in other domains such as social sciences. The author's methodology is an attempt to deal with the linkage of records of the same patient in several databases. The first phase in her methodology is called homogenization. The homogenization of the databases/datasets is performed by applying a method which fills-in the missing values with the predicted values. The second phase of her methodology is called linking of records. It assesses the similarity between records and implements the linkage of the pairs of records with high level of similarity. Finally, the author presents an evaluation of our methodology. The evaluation of the homogenization phase was carried out using multinomial regression while, the evaluation of the aggregated similarities were performed using Jaccard, Jaro-Winkler and Monge-Elkan similarity metrics.\",\"PeriodicalId\":296518,\"journal\":{\"name\":\"Int. J. Knowl. Soc. Res.\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Soc. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJKSR.2015100102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Soc. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJKSR.2015100102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Methodology for Record Linkage: A Medical Domain Case Study
This paper presents a methodology for linking records from several sources each source might contain, missing information. This assumption of missing values has been made, without loss of generality, as the authors has observed that missing information is part of the nature of data in the health domain and also in other domains such as social sciences. The author's methodology is an attempt to deal with the linkage of records of the same patient in several databases. The first phase in her methodology is called homogenization. The homogenization of the databases/datasets is performed by applying a method which fills-in the missing values with the predicted values. The second phase of her methodology is called linking of records. It assesses the similarity between records and implements the linkage of the pairs of records with high level of similarity. Finally, the author presents an evaluation of our methodology. The evaluation of the homogenization phase was carried out using multinomial regression while, the evaluation of the aggregated similarities were performed using Jaccard, Jaro-Winkler and Monge-Elkan similarity metrics.