Periodic watermarking for copyright protection of large language models in cloud computing security

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computer Standards & Interfaces Pub Date : 2025-02-17 DOI:10.1016/j.csi.2025.103983

Pei-Gen Ye , Zhengdao Li , Zuopeng Yang , Pengyu Chen , Zhenxin Zhang , Ning Li , Jun Zheng

{"title":"Periodic watermarking for copyright protection of large language models in cloud computing security","authors":"Pei-Gen Ye , Zhengdao Li , Zuopeng Yang , Pengyu Chen , Zhenxin Zhang , Ning Li , Jun Zheng","doi":"10.1016/j.csi.2025.103983","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.</div></div>","PeriodicalId":50635,"journal":{"name":"Computer Standards & Interfaces","volume":"94 ","pages":"Article 103983"},"PeriodicalIF":4.1000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Standards & Interfaces","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920548925000121","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) have become integral in advancing content understanding and generation, leading to the proliferation of Embedding as a Service (EaaS) within cloud computing platforms. EaaS leverages LLMs to offer scalable, on-demand linguistic embeddings, enhancing various cloud-based applications. However, the proprietary nature of EaaS makes it a target for model extraction attacks, where the timing of such infringements often remains elusive. This paper introduces TimeMarker, a novel framework that enhances temporal traceability in cloud computing environments by embedding distinct watermarks at different sub-periods, marking the first attempt to identify the timing of model extraction attacks. TimeMarker employs an adaptive watermark strength method based on information entropy and frequency domain transformations to refine the detection accuracy of model extraction attacks within cloud infrastructures. The granularity of time frame identification for theft improves as more sub-periods are used. Furthermore, our approach investigates single sub-period theft and extends to multi-sub-period theft scenarios where attackers steal data across many sub-periods to train their models in cloud settings. Validated across five widely used datasets, TimeMarker is capable of detecting model extraction over various sub-periods and assessing its impact on the accuracy and robustness of large models deployed in the cloud. The results demonstrate that TimeMarker effectively identifies different periods of extraction attacks, enhancing EaaS security within cloud computing and extending traditional watermarking to offer copyright protection for LLMs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Standards & Interfaces 工程技术-计算机：软件工程

CiteScore

11.90

自引率

16.00%

发文量

审稿时长

6 months

期刊介绍： The quality of software, well-defined interfaces (hardware and software), the process of digitalisation, and accepted standards in these fields are essential for building and exploiting complex computing, communication, multimedia and measuring systems. Standards can simplify the design and construction of individual hardware and software components and help to ensure satisfactory interworking. Computer Standards & Interfaces is an international journal dealing specifically with these topics. The journal • Provides information about activities and progress on the definition of computer standards, software quality, interfaces and methods, at national, European and international levels • Publishes critical comments on standards and standards activities • Disseminates user''s experiences and case studies in the application and exploitation of established or emerging standards, interfaces and methods • Offers a forum for discussion on actual projects, standards, interfaces and methods by recognised experts • Stimulates relevant research by providing a specialised refereed medium.