An Unstructured Data Desensitization Approach for Futures Industry

Xiaofan Zhi, Li Xue, Sihao Xie
{"title":"An Unstructured Data Desensitization Approach for Futures Industry","authors":"Xiaofan Zhi, Li Xue, Sihao Xie","doi":"10.1145/3581807.3581885","DOIUrl":null,"url":null,"abstract":"The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The development of technologies of Big Data and artificial intelligence provides powerful boost to financing institutions on data digging, while also bringing challenges to prevent private data disclosures. Data desensitization technology is one of the ways to protect private data. Compared to structured data desensitization technologies, unstructured data desensitization technologies are still facing some challenges. On one hand, the accuracy of text recognition from images, voices and videos and other types of unstructured data seriously affects the performance of desensitization. On the other hand, conventional sensitive information recognition methods, which are rules and matching-based, often offer unacceptable desensitized results when facing complicated financial data. Due to such issues, this paper proposes a completely new method for unstructured data desensitization. By first using the evaluation model based on multi-level fine-grained verification for text conversion accuracy to improve the accuracy of text recognition, followed by introducing a sensitive information recognition model based on hybrid analysis to reduce the rates of missed and false detection on sensitive information recognition, this unstructured data desensitization method achieved satisfactory results on real datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
期货行业的非结构化数据脱敏方法
大数据和人工智能技术的发展为金融机构的数据挖掘提供了强大的推动力,同时也为防止私人数据泄露带来了挑战。数据脱敏技术是保护私有数据的一种方法。与结构化数据脱敏技术相比,非结构化数据脱敏技术还面临着一些挑战。一方面,从图像、语音和视频等非结构化数据中识别文本的准确性严重影响脱敏性能。另一方面,传统的基于规则和匹配的敏感信息识别方法在面对复杂的金融数据时,往往会产生不可接受的脱敏结果。针对这些问题,本文提出了一种全新的非结构化数据脱敏方法。该非结构化数据脱敏方法首先采用基于多级细粒度验证的文本转换精度评价模型来提高文本识别的精度,然后引入基于混合分析的敏感信息识别模型来降低敏感信息识别的漏检率和误检率,在真实数据集上取得了满意的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-Scale Channel Attention for Chinese Scene Text Recognition Vehicle Re-identification Based on Multi-Scale Attention Feature Fusion Comparative Study on EEG Feature Recognition based on Deep Belief Network VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention Traffic Flow Forecasting Research Based on Delay Reconstruction and GRU-SVR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1