{"title":"An Unstructured Data Desensitization Approach for Futures Industry","authors":"Xiaofan Zhi, Li Xue, Sihao Xie","doi":"10.1145/3581807.3581885","DOIUrl":null,"url":null,"abstract":"The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.