A mobile edge computing-focused transferable sensitive data identification method based on product quantization

Xinjian Zhao, Guoquan Yuan, Shuhan Qiu, Chenwei Xu, Shanming Wei
{"title":"A mobile edge computing-focused transferable sensitive data identification method based on product quantization","authors":"Xinjian Zhao, Guoquan Yuan, Shuhan Qiu, Chenwei Xu, Shanming Wei","doi":"10.1186/s13677-024-00662-4","DOIUrl":null,"url":null,"abstract":"Sensitive data identification represents the initial and crucial step in safeguarding sensitive information. With the ongoing evolution of the industrial internet, including its interconnectivity across various sectors like the electric power industry, the potential for sensitive data to traverse different domains increases, thereby altering the composition of sensitive data. Consequently, traditional approaches reliant on sensitive vocabularies struggle to adequately address the challenges posed by identifying sensitive data in the era of information abundance. Drawing inspiration from advancements in natural language processing within the realm of deep learning, we propose a transferable Sensitive Data Identification method based on Product Quantization, named PQ-SDI. This innovative approach harnesses both the composition and contextual cues within textual data to accurately pinpoint sensitive information within the context of Mobile Edge Computing (MEC). Notably, PQ-SDI exhibits proficiency not only within a singular domain but also demonstrates adaptability to new domains following training on heterogeneous datasets. Moreover, the method autonomously identifies sensitive data throughout the entire process, eliminating the necessity for human upkeep of sensitive vocabularies. Extensive experimentation with the PQ-SDI model across four real-world datasets, resulting in performance improvements ranging from 2% to 5% over the baseline model and achieves an accuracy of up to 94.41%. In cross-domain trials, PQ-SDI achieved comparable accuracy to training and identification within the same domain. Furthermore, our experiments showcased the product quantization technique significantly reduces the parameter size by tens of times for the subsequent sensitive data identification phase, particularly beneficial for resource-constrained environments characteristic of MEC scenarios. This inherent advantage not only bolsters sensitive data protection but also mitigates the risk of data leakage during transmission, thus enhancing overall security measures in MEC environments.","PeriodicalId":501257,"journal":{"name":"Journal of Cloud Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13677-024-00662-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sensitive data identification represents the initial and crucial step in safeguarding sensitive information. With the ongoing evolution of the industrial internet, including its interconnectivity across various sectors like the electric power industry, the potential for sensitive data to traverse different domains increases, thereby altering the composition of sensitive data. Consequently, traditional approaches reliant on sensitive vocabularies struggle to adequately address the challenges posed by identifying sensitive data in the era of information abundance. Drawing inspiration from advancements in natural language processing within the realm of deep learning, we propose a transferable Sensitive Data Identification method based on Product Quantization, named PQ-SDI. This innovative approach harnesses both the composition and contextual cues within textual data to accurately pinpoint sensitive information within the context of Mobile Edge Computing (MEC). Notably, PQ-SDI exhibits proficiency not only within a singular domain but also demonstrates adaptability to new domains following training on heterogeneous datasets. Moreover, the method autonomously identifies sensitive data throughout the entire process, eliminating the necessity for human upkeep of sensitive vocabularies. Extensive experimentation with the PQ-SDI model across four real-world datasets, resulting in performance improvements ranging from 2% to 5% over the baseline model and achieves an accuracy of up to 94.41%. In cross-domain trials, PQ-SDI achieved comparable accuracy to training and identification within the same domain. Furthermore, our experiments showcased the product quantization technique significantly reduces the parameter size by tens of times for the subsequent sensitive data identification phase, particularly beneficial for resource-constrained environments characteristic of MEC scenarios. This inherent advantage not only bolsters sensitive data protection but also mitigates the risk of data leakage during transmission, thus enhancing overall security measures in MEC environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于乘积量化的以移动边缘计算为重点的可转移敏感数据识别方法
敏感数据识别是保护敏感信息的第一步,也是至关重要的一步。随着工业互联网的不断发展,包括其在电力行业等不同领域的互联性,敏感数据穿越不同领域的可能性增加,从而改变了敏感数据的构成。因此,依赖于敏感词汇表的传统方法难以充分应对在信息丰富时代识别敏感数据所带来的挑战。从深度学习领域的自然语言处理进展中汲取灵感,我们提出了一种基于产品量化的可转移敏感数据识别方法,命名为 PQ-SDI。这种创新方法利用文本数据的构成和上下文线索,在移动边缘计算(MEC)中准确定位敏感信息。值得注意的是,PQ-SDI 不仅在单一领域表现出卓越的能力,而且在异构数据集上接受训练后,还表现出对新领域的适应性。此外,该方法能在整个过程中自动识别敏感数据,无需人工维护敏感词汇。在四个真实数据集上对 PQ-SDI 模型进行了广泛的实验,结果比基准模型的性能提高了 2% 到 5%,准确率高达 94.41%。在跨领域试验中,PQ-SDI 的准确率与同一领域内的训练和识别结果相当。此外,我们的实验表明,在随后的敏感数据识别阶段,乘积量化技术大大减少了数十倍的参数大小,这对于 MEC 场景特有的资源受限环境尤为有利。这一固有优势不仅加强了敏感数据的保护,还降低了数据在传输过程中泄漏的风险,从而增强了 MEC 环境中的整体安全措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A cost-efficient content distribution optimization model for fog-based content delivery networks Toward security quantification of serverless computing SMedIR: secure medical image retrieval framework with ConvNeXt-based indexing and searchable encryption in the cloud A trusted IoT data sharing method based on secure multi-party computation Wind power prediction method based on cloud computing and data privacy protection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1