通过文本转换器和密集神经网络识别侵权内容网站的方法论

IF 3.6 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Future Internet Pub Date : 2023-12-09 DOI:10.3390/fi15120397

Aldo Hernandez-Suarez, G. Sánchez-Pérez, L. K. Toscano-Medina, Hector Perez-Meana, J. Portillo-Portillo, J. Olivares-Mercado

{"title":"通过文本转换器和密集神经网络识别侵权内容网站的方法论","authors":"Aldo Hernandez-Suarez, G. Sánchez-Pérez, L. K. Toscano-Medina, Hector Perez-Meana, J. Portillo-Portillo, J. Olivares-Mercado","doi":"10.3390/fi15120397","DOIUrl":null,"url":null,"abstract":"The rapid evolution of the Internet of Everything (IoE) has significantly enhanced global connectivity and multimedia content sharing, simultaneously escalating the unauthorized distribution of multimedia content, posing risks to intellectual property rights. In 2022 alone, about 130 billion accesses to potentially non-compliant websites were recorded, underscoring the challenges for industries reliant on copyright-protected assets. Amidst prevailing uncertainties and the need for technical and AI-integrated solutions, this study introduces two pivotal contributions. First, it establishes a novel taxonomy aimed at safeguarding and identifying IoE-based content infringements. Second, it proposes an innovative architecture combining IoE components with automated sensors to compile a dataset reflective of potential copyright breaches. This dataset is analyzed using a Bidirectional Encoder Representations from Transformers-based advanced Natural Language Processing (NLP) algorithm, further fine-tuned by a dense neural network (DNN), achieving a remarkable 98.71% accuracy in pinpointing websites that violate copyright.","PeriodicalId":37982,"journal":{"name":"Future Internet","volume":"579 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Methodological Approach for Identifying Websites with Infringing Content via Text Transformers and Dense Neural Networks\",\"authors\":\"Aldo Hernandez-Suarez, G. Sánchez-Pérez, L. K. Toscano-Medina, Hector Perez-Meana, J. Portillo-Portillo, J. Olivares-Mercado\",\"doi\":\"10.3390/fi15120397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid evolution of the Internet of Everything (IoE) has significantly enhanced global connectivity and multimedia content sharing, simultaneously escalating the unauthorized distribution of multimedia content, posing risks to intellectual property rights. In 2022 alone, about 130 billion accesses to potentially non-compliant websites were recorded, underscoring the challenges for industries reliant on copyright-protected assets. Amidst prevailing uncertainties and the need for technical and AI-integrated solutions, this study introduces two pivotal contributions. First, it establishes a novel taxonomy aimed at safeguarding and identifying IoE-based content infringements. Second, it proposes an innovative architecture combining IoE components with automated sensors to compile a dataset reflective of potential copyright breaches. This dataset is analyzed using a Bidirectional Encoder Representations from Transformers-based advanced Natural Language Processing (NLP) algorithm, further fine-tuned by a dense neural network (DNN), achieving a remarkable 98.71% accuracy in pinpointing websites that violate copyright.\",\"PeriodicalId\":37982,\"journal\":{\"name\":\"Future Internet\",\"volume\":\"579 \",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Internet\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/fi15120397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi15120397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

万物互联（IoE）的快速发展大大加强了全球互联和多媒体内容共享，同时也加剧了未经授权的多媒体内容传播，给知识产权带来了风险。仅在 2022 年，就记录了约 1300 亿次对可能不合规网站的访问，这凸显了依赖版权保护资产的行业所面临的挑战。面对普遍存在的不确定性以及对技术和人工智能集成解决方案的需求，本研究提出了两个关键贡献。首先，它建立了一种新颖的分类法，旨在保护和识别基于物联网的内容侵权。其次，它提出了一种创新的架构，将物联网组件与自动传感器相结合，以编制反映潜在版权侵犯行为的数据集。该数据集使用基于变换器的双向编码器表示的高级自然语言处理（NLP）算法进行分析，并通过密集神经网络（DNN）进行进一步微调，在精确定位侵犯版权的网站方面取得了令人瞩目的 98.71% 的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Methodological Approach for Identifying Websites with Infringing Content via Text Transformers and Dense Neural Networks

The rapid evolution of the Internet of Everything (IoE) has significantly enhanced global connectivity and multimedia content sharing, simultaneously escalating the unauthorized distribution of multimedia content, posing risks to intellectual property rights. In 2022 alone, about 130 billion accesses to potentially non-compliant websites were recorded, underscoring the challenges for industries reliant on copyright-protected assets. Amidst prevailing uncertainties and the need for technical and AI-integrated solutions, this study introduces two pivotal contributions. First, it establishes a novel taxonomy aimed at safeguarding and identifying IoE-based content infringements. Second, it proposes an innovative architecture combining IoE components with automated sensors to compile a dataset reflective of potential copyright breaches. This dataset is analyzed using a Bidirectional Encoder Representations from Transformers-based advanced Natural Language Processing (NLP) algorithm, further fine-tuned by a dense neural network (DNN), achieving a remarkable 98.71% accuracy in pinpointing websites that violate copyright.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Internet Computer Science-Computer Networks and Communications

CiteScore

7.10

自引率

5.90%

发文量

303

审稿时长

11 weeks

期刊介绍： Future Internet is a scholarly open access journal which provides an advanced forum for science and research concerned with evolution of Internet technologies and related smart systems for “Net-Living” development. The general reference subject is therefore the evolution towards the future internet ecosystem, which is feeding a continuous, intensive, artificial transformation of the lived environment, for a widespread and significant improvement of well-being in all spheres of human life (private, public, professional). Included topics are: • advanced communications network infrastructures • evolution of internet basic services • internet of things • netted peripheral sensors • industrial internet • centralized and distributed data centers • embedded computing • cloud computing • software defined network functions and network virtualization • cloud-let and fog-computing • big data, open data and analytical tools • cyber-physical systems • network and distributed operating systems • web services • semantic structures and related software tools • artificial and augmented intelligence • augmented reality • system interoperability and flexible service composition • smart mission-critical system architectures • smart terminals and applications • pro-sumer tools for application design and development • cyber security compliance • privacy compliance • reliability compliance • dependability compliance • accountability compliance • trust compliance • technical quality of basic services.