{"title":"Machine Learning for Efficient Integration of Record Systems for Missing US Service Members","authors":"Julia D. Warnke-Sommer, Franklin E. Damann","doi":"10.1109/DSAA.2019.00071","DOIUrl":null,"url":null,"abstract":"More than 16 million Americans served in World War II. Of these service members, over 400,000 were killed in action during the war. Today, more than 72,000 service members remain unaccounted for from World War II. The United States continues to diligently locate, recover, and identify missing personnel from World War II and other past conflicts to provide the fullest possible accounting. This work importantly provides closure and resolution to numerous US families. To fulfill this mission, massive amounts of information must be integrated from historical records, genealogy records, anthropological data, archeological data, odontology data, and DNA. These disparate data sources are produced and maintained by multiple agencies, with different data governance rules and different internal structuring of service member information. Previously, a manual approach had been undertaken to Extract, Transform, Load (ETL) records from these different data sources, which creates the potential for introduced human error. In addition, a large number of person-hours were required to synthesize this data on a biweekly basis. To address this issue, we implemented (i) a regex decision tree to translate genealogical relationships into DNA type availability and (ii) a machine learning approach for record-linkage between disparate data sources. This application is currently in production and greatly reduces person-hours needed and has a very low error rate for record translation and integration.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2019.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
More than 16 million Americans served in World War II. Of these service members, over 400,000 were killed in action during the war. Today, more than 72,000 service members remain unaccounted for from World War II. The United States continues to diligently locate, recover, and identify missing personnel from World War II and other past conflicts to provide the fullest possible accounting. This work importantly provides closure and resolution to numerous US families. To fulfill this mission, massive amounts of information must be integrated from historical records, genealogy records, anthropological data, archeological data, odontology data, and DNA. These disparate data sources are produced and maintained by multiple agencies, with different data governance rules and different internal structuring of service member information. Previously, a manual approach had been undertaken to Extract, Transform, Load (ETL) records from these different data sources, which creates the potential for introduced human error. In addition, a large number of person-hours were required to synthesize this data on a biweekly basis. To address this issue, we implemented (i) a regex decision tree to translate genealogical relationships into DNA type availability and (ii) a machine learning approach for record-linkage between disparate data sources. This application is currently in production and greatly reduces person-hours needed and has a very low error rate for record translation and integration.