Pub Date : 2025-09-01Epub Date: 2025-08-04DOI: 10.1016/j.wpi.2025.102382
Kyungdae Oh , Youngbo Choi , Surin Hong
This study presents a comprehensive patent-based analysis of mRNA vaccine technologies, tracing their progression from experimental tools to scalable biomedical platforms after the COVID-19 pandemic. Leveraging Cooperative Patent Classification (CPC) codes and 25 years of global filings (2001–2025), we built a functional technology tree and mapped innovation across delivery systems, structural design, adjuvants and immune modulation, and Good Manufacturing Practice (GMP)-compliant manufacturing. Lipid nanoparticle-mediated delivery dominates recent applications, underscoring industry priorities in efficacy and scale. Growth curves signal entry into technological maturity, accompanied by wider participation from pharmaceutical firms, academia, and public institutes. Strategic profiling reveals contrasting R&D strategies: ModernaTX and Translate Bio pursue vertically integrated platforms; CureVac emphasizes antigen design and RNA stability; MIT focuses on delivery technologies with broad cross-domain reach. These patterns indicate that mRNA vaccines are becoming foundational infrastructure for precision medicine, oncology, and next-generation immunotherapies. Future competition is poised to intensify around delivery innovation, RNA stabilization, immune modulation, and robust GMP production. Our findings illuminate evolving intellectual-property strategies and highlight platform integration, manufacturing optimization, and cross-sector collaboration as key drivers of innovation in the global mRNA vaccine ecosystem.
{"title":"Patent landscape and innovation trajectories of mRNA vaccine technologies","authors":"Kyungdae Oh , Youngbo Choi , Surin Hong","doi":"10.1016/j.wpi.2025.102382","DOIUrl":"10.1016/j.wpi.2025.102382","url":null,"abstract":"<div><div>This study presents a comprehensive patent-based analysis of mRNA vaccine technologies, tracing their progression from experimental tools to scalable biomedical platforms after the COVID-19 pandemic. Leveraging Cooperative Patent Classification (CPC) codes and 25 years of global filings (2001–2025), we built a functional technology tree and mapped innovation across delivery systems, structural design, adjuvants and immune modulation, and Good Manufacturing Practice (GMP)-compliant manufacturing. Lipid nanoparticle-mediated delivery dominates recent applications, underscoring industry priorities in efficacy and scale. Growth curves signal entry into technological maturity, accompanied by wider participation from pharmaceutical firms, academia, and public institutes. Strategic profiling reveals contrasting R&D strategies: ModernaTX and Translate Bio pursue vertically integrated platforms; CureVac emphasizes antigen design and RNA stability; MIT focuses on delivery technologies with broad cross-domain reach. These patterns indicate that mRNA vaccines are becoming foundational infrastructure for precision medicine, oncology, and next-generation immunotherapies. Future competition is poised to intensify around delivery innovation, RNA stabilization, immune modulation, and robust GMP production. Our findings illuminate evolving intellectual-property strategies and highlight platform integration, manufacturing optimization, and cross-sector collaboration as key drivers of innovation in the global mRNA vaccine ecosystem.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102382"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-27DOI: 10.1016/j.wpi.2025.102387
Aleksei L. Kalinichenko, Kelvin W. Willoughby
This study proposes a new patent search methodology for enhancing the quality and utility of patent research. The methodology focuses on techniques for effectively searching large patent datasets using artificial intelligence (AI) based classifiers to generate robust and reproducible results for subsequent statistical analysis. An extensive literature review revealed that salient approaches to patent searching fail to provide transparent, accurate and reproducible results, thereby hindering validation as well as evoking the need for manual post-processing and subjective judgments. Our proposed methodology, to enable precise, reliable and reproducible AI-enabled search queries, involves employing a novel terminological framework and formulating search regulations based on a formal definition of the technological subject matter of interest. We tested the methodology by applying it to patent searches in the field of AI technologies. In other words, we employed AI to facilitate our development of an operational technical definition of AI for patent searches. The primary results of our research are: (1) an automated patent search technique utilizing a learning algorithm guided by a formal definition of the search area; and (2) a novel terminological framework tailored for patent searches in the AI technology domain. Our approach offers enhanced transparency, reproducibility, and reliability in patent research, with applicability to both AI and other fields of technology.
{"title":"The effective use of artificial intelligence in patent searches: A case study in using AI-based classifiers to identify AI inventions","authors":"Aleksei L. Kalinichenko, Kelvin W. Willoughby","doi":"10.1016/j.wpi.2025.102387","DOIUrl":"10.1016/j.wpi.2025.102387","url":null,"abstract":"<div><div>This study proposes a new patent search methodology for enhancing the quality and utility of patent research. The methodology focuses on techniques for effectively searching large patent datasets using artificial intelligence (AI) based classifiers to generate robust and reproducible results for subsequent statistical analysis. An extensive literature review revealed that salient approaches to patent searching fail to provide transparent, accurate and reproducible results, thereby hindering validation as well as evoking the need for manual post-processing and subjective judgments. Our proposed methodology, to enable precise, reliable and reproducible AI-enabled search queries, involves employing a novel terminological framework and formulating search regulations based on a formal definition of the technological subject matter of interest. We tested the methodology by applying it to patent searches in the field of AI technologies. In other words, we employed AI to facilitate our development of an operational technical definition of AI for patent searches. The primary results of our research are: (1) an automated patent search technique utilizing a learning algorithm guided by a formal definition of the search area; and (2) a novel terminological framework tailored for patent searches in the AI technology domain. Our approach offers enhanced transparency, reproducibility, and reliability in patent research, with applicability to both AI and other fields of technology.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102387"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144903091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-05DOI: 10.1016/j.wpi.2025.102358
David Rees, Manuel Wirz
The rapid development of patent search tools, most of which now include a results ranking system as a matter of course, has revolutionized the speed with which relevant prior art documents can be found compared to traditional search techniques. This short study builds upon a substantial body of work which investigates the use of AI and semantic engines in prior art searching and makes use of unambiguous, objective, yes/no criteria in performance assessment and tool comparison.
{"title":"Evaluating the effectiveness of ranking-based patent search engines for identifying relevant prior art: A comparative study in the area of chemistry","authors":"David Rees, Manuel Wirz","doi":"10.1016/j.wpi.2025.102358","DOIUrl":"10.1016/j.wpi.2025.102358","url":null,"abstract":"<div><div>The rapid development of patent search tools, most of which now include a results ranking system as a matter of course, has revolutionized the speed with which relevant prior art documents can be found compared to traditional search techniques. This short study builds upon a substantial body of work which investigates the use of AI and semantic engines in prior art searching and makes use of unambiguous, objective, yes/no criteria in performance assessment and tool comparison.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102358"},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144223519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-24DOI: 10.1016/j.wpi.2025.102365
Tony Trippe (Associate Editor), Jieh-Sheng Jason Lee (Assistant Professor)
{"title":"Special issue on applications of Generative AI and Large Language Models in the patent domain","authors":"Tony Trippe (Associate Editor), Jieh-Sheng Jason Lee (Assistant Professor)","doi":"10.1016/j.wpi.2025.102365","DOIUrl":"10.1016/j.wpi.2025.102365","url":null,"abstract":"","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102365"},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144365880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patent classification faces increasingly complex challenges due to the exponential growth in volume and technical sophistication of global patent databases. A substantial proportion of patents inherently belong to multiple technological categories simultaneously, rendering classification particularly challenging for both manual and automated systems. Current approaches struggle with computational scalability, prohibitive annotation costs, and the accurate identification of overlapping technical concepts across interdisciplinary innovations. This study presents a novel iterative framework that combines the advanced text comprehension capabilities of Large Language Models (LLMs) with the sample-efficient principles of active learning (AL) for scalable multi-label patent classification. We evaluated our approach using drone-related technologies extracted from a comprehensive dataset of 100,000 patents, focusing on ten key technological component categories. Our LLM-assisted active learning methodology achieved Macro-F1 and Micro-F1 scores of 0.85 and 0.88, respectively, demonstrating a 15% improvement in Macro-F1 compared to established baseline methods. Our approach reduced the required manual annotation effort by approximately 60% while maintaining comparable classification performance. These empirical findings demonstrate the potential for transforming large-scale patent analysis workflows and improving the efficiency of intellectual property management systems
{"title":"Scalable multi-label patent classification via iterative large language model-assisted active learning","authors":"Songquan Xiong, Shikun Chen, Jianwei He, Yangguang Liu, Junjie Mao, Chao Liu","doi":"10.1016/j.wpi.2025.102380","DOIUrl":"10.1016/j.wpi.2025.102380","url":null,"abstract":"<div><div>Patent classification faces increasingly complex challenges due to the exponential growth in volume and technical sophistication of global patent databases. A substantial proportion of patents inherently belong to multiple technological categories simultaneously, rendering classification particularly challenging for both manual and automated systems. Current approaches struggle with computational scalability, prohibitive annotation costs, and the accurate identification of overlapping technical concepts across interdisciplinary innovations. This study presents a novel iterative framework that combines the advanced text comprehension capabilities of Large Language Models (LLMs) with the sample-efficient principles of active learning (AL) for scalable multi-label patent classification. We evaluated our approach using drone-related technologies extracted from a comprehensive dataset of 100,000 patents, focusing on ten key technological component categories. Our LLM-assisted active learning methodology achieved Macro-F1 and Micro-F1 scores of 0.85 and 0.88, respectively, demonstrating a 15% improvement in Macro-F1 compared to established baseline methods. Our approach reduced the required manual annotation effort by approximately 60% while maintaining comparable classification performance. These empirical findings demonstrate the potential for transforming large-scale patent analysis workflows and improving the efficiency of intellectual property management systems</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102380"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-20DOI: 10.1016/j.wpi.2025.102374
Susan Bates (Independent Researcher)
Welcome to the latest quarterly Literature Listing intended as a current awareness service for readers indicating newly published books, journal, and conference articles on IP management; Information Retrieval Techniques; Patent Landscapes; Education & Certification; and Legal & Intellectual Property Office Matters. The current Literature Listing was compiled mid-May 2025. Key resources include Scopus, Digital Commons, publishers' RSS feeds, and serendipity! This article gives a selection of interesting references to whet your appetite - the full list of references can be found in the companion datafile.
{"title":"Literature listing","authors":"Susan Bates (Independent Researcher)","doi":"10.1016/j.wpi.2025.102374","DOIUrl":"10.1016/j.wpi.2025.102374","url":null,"abstract":"<div><div>Welcome to the latest quarterly Literature Listing intended as a current awareness service for readers indicating newly published books, journal, and conference articles on IP management; Information Retrieval Techniques; Patent Landscapes; Education & Certification; and Legal & Intellectual Property Office Matters. The current Literature Listing was compiled mid-May 2025. Key resources include Scopus, Digital Commons, publishers' RSS feeds, and serendipity! This article gives a selection of interesting references to whet your appetite - the full list of references can be found in the companion datafile.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102374"},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144320972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-16DOI: 10.1016/j.wpi.2025.102385
Gülfem Özmen , Jussi Heikkilä , Matti Karvonen , Ville Ojanen
We present empirical evidence on the digital marketing choices of standard essential patent licensing programs on patent pool and licensor websites. We highlight the importance of dynamic learning in licensing negotiation events and strategic information revelation in the presence of asymmetric information. We document licensing schemes and licensed units adopted in patent licensing programs. We analyze the marketing strategies of licensing programs using applicable elements of the Marketing Mix framework. We observe significant variation in publicly available information across licensing programs. This suggests that licensors face trade-offs in deciding what information is revealed and anchored in pre-negotiations, as part of licensing program marketing, and during confidential licensing negotiations. Future studies could analyze how generative artificial intelligence (AI) systems may promote marketing and transparency of patent licensing programs.
{"title":"Digital marketing of standard essential patent licensing programs","authors":"Gülfem Özmen , Jussi Heikkilä , Matti Karvonen , Ville Ojanen","doi":"10.1016/j.wpi.2025.102385","DOIUrl":"10.1016/j.wpi.2025.102385","url":null,"abstract":"<div><div>We present empirical evidence on the digital marketing choices of standard essential patent licensing programs on patent pool and licensor websites. We highlight the importance of dynamic learning in licensing negotiation events and strategic information revelation in the presence of asymmetric information. We document licensing schemes and licensed units adopted in patent licensing programs. We analyze the marketing strategies of licensing programs using applicable elements of the Marketing Mix framework. We observe significant variation in publicly available information across licensing programs. This suggests that licensors face trade-offs in deciding what information is revealed and anchored in pre-negotiations, as part of licensing program marketing, and during confidential licensing negotiations. Future studies could analyze how generative artificial intelligence (AI) systems may promote marketing and transparency of patent licensing programs.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102385"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144851775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-13DOI: 10.1016/j.wpi.2025.102383
Tzu-Yu Lin , Li-Chieh Chou
Purpose
This study aims to systematically synthesize the practical applications of artificial intelligence (AI) in patent analysis by constructing a comprehensive matrix that aligns distinct AI techniques with their corresponding analytical tasks. The “AI Technique and Analytical Task” matrix provides a structured framework for understanding how various AI approaches are deployed across different functional objectives within the patent analysis domain.
Design/methodology/approach
This study integrates bibliometric analysis, BERT-based topic modeling, and literature review to explore AI applications in patent analysis. Data were retrieved from the Web of Science Core Collection using a dual-focus search strategy targeting AI techniques and patent analysis tasks. A clear distinction was made to exclude studies analyzing AI trends using patent data, retaining only those applying AI methods to patent analytics. With these strategies, 718 relevant publications were selected as the basis for analysis.
Findings
The results reveal exponential growth in AI-powered patent analysis research since the mid-2010s, with Technological Forecasting and Social Change (TFSC), Scientometrics, and World Patent Information (WPI) identified as the leading publication platforms. Geographical analysis shows that China and South Korea have rapidly increased their research output and institutional engagement, while the U.S. maintains a foundational yet less recent presence.
With topic modeling technique, this study identified eleven major thematic clusters, spanning tasks such as emerging knowledge discovery, technology forecasting, and opportunity identification. These were integrated into “AI Technique and Analytical Task” matrix, which systematically maps the relationships between AI methods (such as pretrained language models, convolutional neural networks, semantic analysis, and topic modeling) and their practical implementations. Among these, patent classification and nature language processing (NLP) emerged as the most impactful applications, underscoring AI's vital role in enabling scalable, data-driven approaches to managing complex patent information.
Originality
This study presents a novel integration of multi-layered literature retrieval strategies, bibliometric analysis, BERT-based topic modeling, and an AI technique-to-analytical task matrix to construct a systematic and structured knowledge framework. This integrative approach not only delineates the interdisciplinary evolution of AI applications in patent analysis but also provides strategic guidance for future research, particularly in advancing empirical validation, informing policy applications, and promoting global inclusivity in this emerging field.
本研究旨在通过构建一个综合矩阵,将不同的人工智能技术与其相应的分析任务结合起来,系统地综合人工智能(AI)在专利分析中的实际应用。“人工智能技术和分析任务”矩阵提供了一个结构化框架,用于理解如何在专利分析领域内跨不同功能目标部署各种人工智能方法。本研究结合文献计量分析、基于bert的主题建模和文献综述,探索人工智能在专利分析中的应用。使用针对人工智能技术和专利分析任务的双焦点搜索策略从Web of Science核心馆藏中检索数据。明确区分了使用专利数据分析人工智能趋势的研究,只保留了那些将人工智能方法应用于专利分析的研究。根据这些策略,选择了718份相关出版物作为分析的基础。结果显示,自2010年代中期以来,人工智能驱动的专利分析研究呈指数级增长,其中技术预测与社会变革(TFSC)、科学计量学(Scientometrics)和世界专利信息(WPI)被确定为领先的出版平台。地理分析表明,中国和韩国的研究产出和机构参与都在迅速增加,而美国则保持着基础地位,但时间并不长。利用主题建模技术,本研究确定了11个主要的主题集群,涵盖了新兴知识发现、技术预测和机会识别等任务。这些被整合到“人工智能技术和分析任务”矩阵中,该矩阵系统地映射了人工智能方法(如预训练语言模型、卷积神经网络、语义分析和主题建模)与其实际实现之间的关系。其中,专利分类和自然语言处理(NLP)成为最具影响力的应用,突显了人工智能在实现可扩展、数据驱动的方法来管理复杂专利信息方面的重要作用。本研究将多层文献检索策略、文献计量分析、基于bert的主题建模和人工智能技术-分析任务矩阵相结合,构建系统化、结构化的知识框架。这种综合方法不仅描述了专利分析中人工智能应用的跨学科演变,而且为未来的研究提供了战略指导,特别是在推进经验验证、为政策应用提供信息和促进这一新兴领域的全球包容性方面。
{"title":"A systematic review of artificial intelligence applications and methodological advances in patent analysis","authors":"Tzu-Yu Lin , Li-Chieh Chou","doi":"10.1016/j.wpi.2025.102383","DOIUrl":"10.1016/j.wpi.2025.102383","url":null,"abstract":"<div><h3>Purpose</h3><div>This study aims to systematically synthesize the practical applications of artificial intelligence (AI) in patent analysis by constructing a comprehensive matrix that aligns distinct AI techniques with their corresponding analytical tasks. The “AI Technique and Analytical Task” matrix provides a structured framework for understanding how various AI approaches are deployed across different functional objectives within the patent analysis domain.</div></div><div><h3>Design/methodology/approach</h3><div>This study integrates bibliometric analysis, BERT-based topic modeling, and literature review to explore AI applications in patent analysis. Data were retrieved from the Web of Science Core Collection using a dual-focus search strategy targeting AI techniques and patent analysis tasks. A clear distinction was made to exclude studies analyzing AI trends using patent data, retaining only those applying AI methods to patent analytics. With these strategies, 718 relevant publications were selected as the basis for analysis.</div></div><div><h3>Findings</h3><div>The results reveal exponential growth in AI-powered patent analysis research since the mid-2010s, with <em>Technological Forecasting and Social Change (TFSC)</em>, <em>Scientometrics</em>, and <em>World Patent Information (WPI)</em> identified as the leading publication platforms. Geographical analysis shows that China and South Korea have rapidly increased their research output and institutional engagement, while the U.S. maintains a foundational yet less recent presence.</div><div>With topic modeling technique, this study identified eleven major thematic clusters, spanning tasks such as emerging knowledge discovery, technology forecasting, and opportunity identification. These were integrated into “AI Technique and Analytical Task” matrix, which systematically maps the relationships between AI methods (such as pretrained language models, convolutional neural networks, semantic analysis, and topic modeling) and their practical implementations. Among these, patent classification and nature language processing (NLP) emerged as the most impactful applications, underscoring AI's vital role in enabling scalable, data-driven approaches to managing complex patent information.</div></div><div><h3>Originality</h3><div>This study presents a novel integration of multi-layered literature retrieval strategies, bibliometric analysis, BERT-based topic modeling, and an AI technique-to-analytical task matrix to construct a systematic and structured knowledge framework. This integrative approach not only delineates the interdisciplinary evolution of AI applications in patent analysis but also provides strategic guidance for future research, particularly in advancing empirical validation, informing policy applications, and promoting global inclusivity in this emerging field.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"82 ","pages":"Article 102383"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-04-25DOI: 10.1016/j.wpi.2025.102360
Volker D. Hähnke, Arnaud Wéry, Matthias Wirth, Alexander Klenner-Bajaja
Patents are organized using systems of technical concepts like the Cooperative Patent Classification. Classification information is extremely valuable for patent professionals, particularly for patent search. Language models have proven useful in Natural Language Processing tasks, including document classification. Generally, pre-training on a domain is essential for optimal downstream performance. Currently, there are no models pre-trained on patents with sequence length above 512. We pre-trained a RoBERTa model with sequence length 1024, increasing the fully covered claims sections from 12% to 53%. It has a ‘base’ configuration, reducing free parameters compared to ‘large’ models in the patent domain three-fold. We fine-tuned the model on classification tasks in the CPC, up to leaf level. Our tokenizer produces sequences on average 5% and up to 10% shorter than the general English RoBERTa tokenizer. With our pre-trained ‘base’ size model, we reach classification performance better than general English models, comparable to ‘large’ models pre-trained on patents. On the finest CPC granularity, 88% of test documents have at least one ground truth symbol in the top 10 predictions. Our CPC prediction models and data sets are publicly accessible. With the described procedures, we can periodically repeat pre-training and fine-tuning to cope with drift effects.
{"title":"Encoder models at the European Patent Office: Pre-training and use cases","authors":"Volker D. Hähnke, Arnaud Wéry, Matthias Wirth, Alexander Klenner-Bajaja","doi":"10.1016/j.wpi.2025.102360","DOIUrl":"10.1016/j.wpi.2025.102360","url":null,"abstract":"<div><div>Patents are organized using systems of technical concepts like the Cooperative Patent Classification. Classification information is extremely valuable for patent professionals, particularly for patent search. Language models have proven useful in Natural Language Processing tasks, including document classification. Generally, pre-training on a domain is essential for optimal downstream performance. Currently, there are no models pre-trained on patents with sequence length above 512. We pre-trained a RoBERTa model with sequence length 1024, increasing the fully covered claims sections from 12% to 53%. It has a ‘base’ configuration, reducing free parameters compared to ‘large’ models in the patent domain three-fold. We fine-tuned the model on classification tasks in the CPC, up to leaf level. Our tokenizer produces sequences on average 5% and up to 10% shorter than the general English RoBERTa tokenizer. With our pre-trained ‘base’ size model, we reach classification performance better than general English models, comparable to ‘large’ models pre-trained on patents. On the finest CPC granularity, 88% of test documents have at least one ground truth symbol in the top 10 predictions. Our CPC prediction models and data sets are publicly accessible. With the described procedures, we can periodically repeat pre-training and fine-tuning to cope with drift effects.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"81 ","pages":"Article 102360"},"PeriodicalIF":2.2,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}