Pub Date : 2026-03-01Epub Date: 2026-02-02DOI: 10.1016/j.wpi.2026.102428
Julia N. Heinrich
This article develops a search strategy for a weekly alert in the field of targeted protein degradation (TPD). The task is challenging due to the rapid evolution of therapeutic approaches, the complexity of patent publication feeds, nuances in patent database indexing, language translation issues, and the tendency of inventors and attorneys to use unique terminology. TPD is an emerging, multidisciplinary technology that aims to redirect molecules to hijack natural protein degradation pathways, targeting previously “undruggable” proteins. Unlike traditional “occupancy-driven” drugs, TPD drugs use “event-driven” pharmacology, initiating specific biological events such as protein degradation or modulation of protein-protein interactions (PPIs). This article provides an overview of TPD, presents a patent search strategy for identifying small molecules targeting the ubiquitin-proteasome system (UPS), and highlights the need for effective innovation tracking in this field.
{"title":"Patent alert for targeted protein degradation","authors":"Julia N. Heinrich","doi":"10.1016/j.wpi.2026.102428","DOIUrl":"10.1016/j.wpi.2026.102428","url":null,"abstract":"<div><div>This article develops a search strategy for a weekly alert in the field of targeted protein degradation (TPD). The task is challenging due to the rapid evolution of therapeutic approaches, the complexity of patent publication feeds, nuances in patent database indexing, language translation issues, and the tendency of inventors and attorneys to use unique terminology. TPD is an emerging, multidisciplinary technology that aims to redirect molecules to hijack natural protein degradation pathways, targeting previously “undruggable” proteins. Unlike traditional “occupancy-driven” drugs, TPD drugs use “event-driven” pharmacology, initiating specific biological events such as protein degradation or modulation of protein-protein interactions (PPIs). This article provides an overview of TPD, presents a patent search strategy for identifying small molecules targeting the ubiquitin-proteasome system (UPS), and highlights the need for effective innovation tracking in this field.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102428"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-05DOI: 10.1016/j.wpi.2026.102434
Mustafa Sofean
Patent identification is the process of finding patents relevant to a specific technical topic, especially in the early stages of research and development (R&D) projects. Accurately identifying relevant patents helps scientists, researchers, and industry maximize IP value, anticipate challenges, and gain insights into technological trends, competition, and future innovation opportunities. Conventional approaches, like keyword searches or retrieval based on patent classification codes, frequently result in low precision, while machine learning methods demand extensive manual annotation, posing a major bottleneck for domain-specific applications. In this work, we present a deep learning–based approach for domain-specific patent identification, with a focus on the plasma physics and cybersecurity domains. Our methodology employs a weak supervision paradigm to construct a high-quality training dataset by integrating multiple noisy labeling sources, including linguistic patterns, domain heuristics, and expert-defined rules. Using this synthesized training dataset, we fine-tune pre-trained transformer models, systematically optimizing hyperparameters to maximize performance. The resulting models can be deployed as automated patent identification systems tailored to specialized scientific and industrial contexts. We evaluate our models on previously unseen test sets using standard performance metrics. A comprehensive evaluation on unseen test set demonstrates that our approach achieves high accuracy and significantly outperforms a benchmark in-context learning approach based on large language models.
{"title":"Identification of domain-relevant patents via weakly supervised deep learning","authors":"Mustafa Sofean","doi":"10.1016/j.wpi.2026.102434","DOIUrl":"10.1016/j.wpi.2026.102434","url":null,"abstract":"<div><div>Patent identification is the process of finding patents relevant to a specific technical topic, especially in the early stages of research and development (R&D) projects. Accurately identifying relevant patents helps scientists, researchers, and industry maximize IP value, anticipate challenges, and gain insights into technological trends, competition, and future innovation opportunities. Conventional approaches, like keyword searches or retrieval based on patent classification codes, frequently result in low precision, while machine learning methods demand extensive manual annotation, posing a major bottleneck for domain-specific applications. In this work, we present a deep learning–based approach for domain-specific patent identification, with a focus on the plasma physics and cybersecurity domains. Our methodology employs a weak supervision paradigm to construct a high-quality training dataset by integrating multiple noisy labeling sources, including linguistic patterns, domain heuristics, and expert-defined rules. Using this synthesized training dataset, we fine-tune pre-trained transformer models, systematically optimizing hyperparameters to maximize performance. The resulting models can be deployed as automated patent identification systems tailored to specialized scientific and industrial contexts. We evaluate our models on previously unseen test sets using standard performance metrics. A comprehensive evaluation on unseen test set demonstrates that our approach achieves high accuracy and significantly outperforms a benchmark in-context learning approach based on large language models.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102434"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-20DOI: 10.1016/j.wpi.2025.102423
Milad Armani Dehghani , Mehmet Sahiner , Noptanit Chotisarn
Patents are critical indicators of innovation, especially in fast-evolving domains like Financial Technology (FinTech). However, accurately predicting patent grant outcomes with modern artificial intelligence techniques has remained challenging. This study addresses that gap by applying state-of-the-art machine learning (ML), including ensemble methods and deep learning models, to a dataset of 20,008 FinTech patent applications from 2000 to 2020. We demonstrate that our ML framework can forecast grant success with high precision (up to 89 %), revealing that patent quality and strategic filing choices, such as optimal IPC classes and jurisdictions, are key determinants of grant probability. The findings highlight practical implications for innovators and intellectual property managers, such as better resource allocation and informed patent strategy decisions. Overall, this work introduces a novel, AI-driven approach to patent analytics in FinTech, offering a forward-looking tool to enhance innovation management and strategic IP planning.
{"title":"From filing to grant: Predicting patent outcomes in FinTech using a predictive analytics perspective","authors":"Milad Armani Dehghani , Mehmet Sahiner , Noptanit Chotisarn","doi":"10.1016/j.wpi.2025.102423","DOIUrl":"10.1016/j.wpi.2025.102423","url":null,"abstract":"<div><div>Patents are critical indicators of innovation, especially in fast-evolving domains like Financial Technology (FinTech). However, accurately predicting patent grant outcomes with modern artificial intelligence techniques has remained challenging. This study addresses that gap by applying state-of-the-art machine learning (ML), including ensemble methods and deep learning models, to a dataset of 20,008 FinTech patent applications from 2000 to 2020. We demonstrate that our ML framework can forecast grant success with high precision (up to 89 %), revealing that patent quality and strategic filing choices, such as optimal IPC classes and jurisdictions, are key determinants of grant probability. The findings highlight practical implications for innovators and intellectual property managers, such as better resource allocation and informed patent strategy decisions. Overall, this work introduces a novel, AI-driven approach to patent analytics in FinTech, offering a forward-looking tool to enhance innovation management and strategic IP planning.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102423"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-08DOI: 10.1016/j.wpi.2025.102419
Jeong-sang Eom , Dong-chan Kim , Ji-hun Han , Won-Gyu Bae
Offshore wind energy is emerging as a pivotal energy resource, and as turbine dimensions expand to meet growing power demands, structural requirements for support towers have intensified. This has led to the use of thicker steel plates, introducing challenges such as microstructural inhomogeneity from uneven cooling across plate thicknesses. To address these issues, we conducted a comprehensive patent analysis on heavy steel plate technologies to identify technological gaps and track innovation trends. We developed a classification framework to organize production methods aimed at enhancing mechanical properties. Additionally, we assessed average steel plate thicknesses across countries and companies, reflecting the trend towards larger turbines and towers. Patent impact and market potential were evaluated using the Cites Per Patent (CPP) and Patent Family Size (PFS) indices.
{"title":"Enhancing mechanical performance of thick steel plates for offshore wind structures: A classification and patent landscape study","authors":"Jeong-sang Eom , Dong-chan Kim , Ji-hun Han , Won-Gyu Bae","doi":"10.1016/j.wpi.2025.102419","DOIUrl":"10.1016/j.wpi.2025.102419","url":null,"abstract":"<div><div>Offshore wind energy is emerging as a pivotal energy resource, and as turbine dimensions expand to meet growing power demands, structural requirements for support towers have intensified. This has led to the use of thicker steel plates, introducing challenges such as microstructural inhomogeneity from uneven cooling across plate thicknesses. To address these issues, we conducted a comprehensive patent analysis on heavy steel plate technologies to identify technological gaps and track innovation trends. We developed a classification framework to organize production methods aimed at enhancing mechanical properties. Additionally, we assessed average steel plate thicknesses across countries and companies, reflecting the trend towards larger turbines and towers. Patent impact and market potential were evaluated using the Cites Per Patent (CPP) and Patent Family Size (PFS) indices.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102419"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-14DOI: 10.1016/j.wpi.2026.102426
Nathan Monnet , Loïc Maréchal
We introduce a novel approach to text classification by combining doc2vec embeddings with advanced clustering techniques to improve the analysis of specialized, high-dimensional textual data. We integrate unsupervised methods such as Louvain, K-means, and Spectral clustering with doc2vec to enhance the detection of semantic patterns across a large corpus. As a case study, we apply this methodology to cybersecurity risk analysis using the MITRE ATT&CK framework to structure and reduce the dimensionality of cyberattack tactics. Louvain clustering proved the most effective among the tested methods, achieving the best balance between cluster coherence and computational efficiency. Our approach identifies four “super tactics”, demonstrating how clustering improves thematic coherence and risk attribution. The results validate the utility of combining doc2vec with clustering, particularly Louvain, for enhancing topic modelling and text classification.
{"title":"Clustering doc2vec output for topic-dimensionality reduction: A MITRE ATT&CK calibration","authors":"Nathan Monnet , Loïc Maréchal","doi":"10.1016/j.wpi.2026.102426","DOIUrl":"10.1016/j.wpi.2026.102426","url":null,"abstract":"<div><div>We introduce a novel approach to text classification by combining doc2vec embeddings with advanced clustering techniques to improve the analysis of specialized, high-dimensional textual data. We integrate unsupervised methods such as Louvain, K-means, and Spectral clustering with doc2vec to enhance the detection of semantic patterns across a large corpus. As a case study, we apply this methodology to cybersecurity risk analysis using the MITRE ATT&CK framework to structure and reduce the dimensionality of cyberattack tactics. Louvain clustering proved the most effective among the tested methods, achieving the best balance between cluster coherence and computational efficiency. Our approach identifies four “super tactics”, demonstrating how clustering improves thematic coherence and risk attribution. The results validate the utility of combining doc2vec with clustering, particularly Louvain, for enhancing topic modelling and text classification.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102426"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-05DOI: 10.1016/j.wpi.2025.102421
K.C. Pantoja , V.S. Tarabal , M.E.J. Oliveira , A.G.S. Oliveira , C.L.V. Silva , P.F. Nascimento , T.A. França , R.I.M.A. Ribeiro , J.A. Dernowsek , P.A. Granjeiro
Three-dimensional (3D) bioprinting is emerging as a high-complexity technology in the field of biofabrication, integrating interdisciplinary principles from engineering, materials science, cell biology, and regenerative medicine. This technique enables the fabrication of functional biological constructs composed of living cells and biomaterials through additive manufacturing methods with high spatial resolution. This article provides an in-depth analysis of the main applications, recent advances, and technical limitations related to 3D bioprinting, with emphasis on its implementation in bioprocesses. In the biomedical context, significant progress has been observed in tissue engineering and 3D disease modeling, particularly in translational oncology and the development of predictive drug screening platforms. In industrial biotechnology, bioprinting has been employed for the production of high-purity biological inputs, such as extracellular matrix (ECM) proteins, using human cell systems, thereby promoting more sustainable, animal-free production routes. In the food industry, this technology allows the development of personalized and nutritionally tailored products incorporating innovative and environmentally sustainable ingredients, such as microalgae and insects. In the agricultural sector, 3D bioprinting has been applied to plant tissue engineering and the design of biomimetic models to optimize crop systems. Additionally, a patentometric analysis highlights the global expansion of 3D bioprinting, with a notable increase in filings across international jurisdictions and a gradual transition toward technological maturity. The findings underscore the strategic role of 3D bioprinting as a driver of technological innovation with significant impacts on health, sustainability, and the bioeconomy.
{"title":"Global patent panorama of 3D bioprinting: Trends, maturity and key stakeholders","authors":"K.C. Pantoja , V.S. Tarabal , M.E.J. Oliveira , A.G.S. Oliveira , C.L.V. Silva , P.F. Nascimento , T.A. França , R.I.M.A. Ribeiro , J.A. Dernowsek , P.A. Granjeiro","doi":"10.1016/j.wpi.2025.102421","DOIUrl":"10.1016/j.wpi.2025.102421","url":null,"abstract":"<div><div>Three-dimensional (3D) bioprinting is emerging as a high-complexity technology in the field of biofabrication, integrating interdisciplinary principles from engineering, materials science, cell biology, and regenerative medicine. This technique enables the fabrication of functional biological constructs composed of living cells and biomaterials through additive manufacturing methods with high spatial resolution. This article provides an in-depth analysis of the main applications, recent advances, and technical limitations related to 3D bioprinting, with emphasis on its implementation in bioprocesses. In the biomedical context, significant progress has been observed in tissue engineering and 3D disease modeling, particularly in translational oncology and the development of predictive drug screening platforms. In industrial biotechnology, bioprinting has been employed for the production of high-purity biological inputs, such as extracellular matrix (ECM) proteins, using human cell systems, thereby promoting more sustainable, animal-free production routes. In the food industry, this technology allows the development of personalized and nutritionally tailored products incorporating innovative and environmentally sustainable ingredients, such as microalgae and insects. In the agricultural sector, 3D bioprinting has been applied to plant tissue engineering and the design of biomimetic models to optimize crop systems. Additionally, a patentometric analysis highlights the global expansion of 3D bioprinting, with a notable increase in filings across international jurisdictions and a gradual transition toward technological maturity. The findings underscore the strategic role of 3D bioprinting as a driver of technological innovation with significant impacts on health, sustainability, and the bioeconomy.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102421"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-05DOI: 10.1016/j.wpi.2025.102422
Hady M. Khawand , Markus Kittler , Elie Chahda
This study assesses the level of intellectual property (IP) awareness among top executives in small and medium-sized enterprises (SMEs) within the Gulf Cooperation Council (GCC) region. It addresses a notable gap in the literature on IP familiarity and its strategic use in emerging markets. We surveyed 526 executives across the six GCC states, with scales developed to measure IP familiarity, perception of IP's importance, and understanding of central IP concepts (trademarks, patents, copyrights). Statistical analysis reveals a significant lack of IP awareness, particularly in fundamental areas like patent protection and territorial limitations, underscoring potential risks to strategic decision-making and growth. The findings demonstrate a strong, positive correlation between participation in IP-related education and familiarity with IP concepts, yet most executives lack practical understanding of IP's strategic value. Tailored IP education—through workshops, university courses, and industry conferences—is recommended to bridge this gap, aligning executive knowledge with international standards and fostering an innovation-driven business environment in the GCC.
{"title":"Intellectual property awareness in the Gulf region","authors":"Hady M. Khawand , Markus Kittler , Elie Chahda","doi":"10.1016/j.wpi.2025.102422","DOIUrl":"10.1016/j.wpi.2025.102422","url":null,"abstract":"<div><div>This study assesses the level of intellectual property (IP) awareness among top executives in small and medium-sized enterprises (SMEs) within the Gulf Cooperation Council (GCC) region. It addresses a notable gap in the literature on IP familiarity and its strategic use in emerging markets. We surveyed 526 executives across the six GCC states, with scales developed to measure IP familiarity, perception of IP's importance, and understanding of central IP concepts (trademarks, patents, copyrights). Statistical analysis reveals a significant lack of IP awareness, particularly in fundamental areas like patent protection and territorial limitations, underscoring potential risks to strategic decision-making and growth. The findings demonstrate a strong, positive correlation between participation in IP-related education and familiarity with IP concepts, yet most executives lack practical understanding of IP's strategic value. Tailored IP education—through workshops, university courses, and industry conferences—is recommended to bridge this gap, aligning executive knowledge with international standards and fostering an innovation-driven business environment in the GCC.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102422"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-02DOI: 10.1016/j.wpi.2025.102424
Susan Bates
Welcome to the latest quarterly Literature Listing intended as a current awareness service for readers indicating newly published books, journal, and conference articles on IP management; Information Retrieval Techniques; Patent Landscapes; Education & Certification; and Legal & Intellectual Property Office Matters. The current Literature Listing was compiled mid-November 2025. Key resources include Scopus, Digital Commons, publishers' RSS feeds, and serendipity! This article gives a selection of interesting references to whet your appetite - the full list of references can be found in the companion datafile.
{"title":"Literature listing","authors":"Susan Bates","doi":"10.1016/j.wpi.2025.102424","DOIUrl":"10.1016/j.wpi.2025.102424","url":null,"abstract":"<div><div>Welcome to the latest quarterly Literature Listing intended as a current awareness service for readers indicating newly published books, journal, and conference articles on IP management; Information Retrieval Techniques; Patent Landscapes; Education & Certification; and Legal & Intellectual Property Office Matters. The current Literature Listing was compiled mid-November 2025. Key resources include Scopus, Digital Commons, publishers' RSS feeds, and serendipity! This article gives a selection of interesting references to whet your appetite - the full list of references can be found in the companion datafile.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102424"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-28DOI: 10.1016/j.wpi.2026.102433
Renukswamy Chikkamath , Linda Andersson , Markus Endres
Semantic search with embedding models offers an alternative to traditional keyword-based patent retrieval but often struggles with computational cost and efficiency in real-time scenarios compared to methods like BM25. Meanwhile, the rapid advancement of language models raises questions about the necessity of domain-specific models versus the viability of general-purpose ones. This work presents a comprehensive evaluation of embedding-based patent search using the CLEF-IP 2011 dataset. We assess 10 configurations employing language models as retrievers, re-rankers, or hybrids, across 9 models, both patent-specific and general-purpose, tested in 105 experimental setups. Our best configurations deliver a 14.81% absolute MAP improvement over state-of-the-art baselines and outperform patent-specific embeddings by at least 28.95% in MAP. We further show that embedding quantization enables large-scale patent search with up to 30×faster retrieval and 32×lower memory usage. These results provide practical guidance for integrating embedding models into patent prior art search while addressing performance and scalability constraints.
{"title":"Rethinking patent retrieval with language models: Toward scalable and efficient search","authors":"Renukswamy Chikkamath , Linda Andersson , Markus Endres","doi":"10.1016/j.wpi.2026.102433","DOIUrl":"10.1016/j.wpi.2026.102433","url":null,"abstract":"<div><div>Semantic search with embedding models offers an alternative to traditional keyword-based patent retrieval but often struggles with computational cost and efficiency in real-time scenarios compared to methods like BM25. Meanwhile, the rapid advancement of language models raises questions about the necessity of domain-specific models versus the viability of general-purpose ones. This work presents a comprehensive evaluation of embedding-based patent search using the CLEF-IP 2011 dataset. We assess 10 configurations employing language models as retrievers, re-rankers, or hybrids, across 9 models, both patent-specific and general-purpose, tested in 105 experimental setups. Our best configurations deliver a 14.81% absolute MAP improvement over state-of-the-art baselines and outperform patent-specific embeddings by at least 28.95% in MAP. We further show that embedding quantization enables large-scale patent search with up to 30×faster retrieval and 32×lower memory usage. These results provide practical guidance for integrating embedding models into patent prior art search while addressing performance and scalability constraints.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102433"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patents play a pivotal role in engineering design by safeguarding innovation, forecasting technical trends, and promoting knowledge sharing. However, the vast volume of patents and their complex technical descriptions pose significant challenges for effective analysis and information retrieval. To address these issues, we propose an integrated framework that combines large language models (LLM) and a BERT-refined approach for patent analysis. Specifically, patent titles and abstracts are first collected, and term frequency-inverse document frequency (TF-IDF) is introduced to extract candidate keyphrases. An LLM is then employed to refine these keyphrases by filtering irrelevant terms and identifying significant keywords. Subsequently, a fine-tuned BERT model is developed for named entity recognition (NER) to extract domain-specific keywords, which are further refined into keyphrases through our BERT-refined keyphrase extraction (BRKE) method. Experimental results on a large dataset of USPTO patents demonstrate the effectiveness of the proposed BRKE. It achieves the highest F1-score of 52.97% when the top-10 keyphrases are retained, outperforming keyBERT, YAKE, and RAKE by 9.52%, 6.1%, and 2.35%, respectively. By enhancing the accuracy of patent keyphrase extraction, our contributions make patent analysis more efficient and accessible to both analysts and design engineers.
{"title":"Towards efficient patent analysis: A large language model and BERT-refined methodology for keyphrase extraction","authors":"Yaojia Mu, Jianhua Wang, Huaxiang Zhang, Zhongxue Gan, Guo-Niu Zhu","doi":"10.1016/j.wpi.2026.102435","DOIUrl":"10.1016/j.wpi.2026.102435","url":null,"abstract":"<div><div>Patents play a pivotal role in engineering design by safeguarding innovation, forecasting technical trends, and promoting knowledge sharing. However, the vast volume of patents and their complex technical descriptions pose significant challenges for effective analysis and information retrieval. To address these issues, we propose an integrated framework that combines large language models (LLM) and a BERT-refined approach for patent analysis. Specifically, patent titles and abstracts are first collected, and term frequency-inverse document frequency (TF-IDF) is introduced to extract candidate keyphrases. An LLM is then employed to refine these keyphrases by filtering irrelevant terms and identifying significant keywords. Subsequently, a fine-tuned BERT model is developed for named entity recognition (NER) to extract domain-specific keywords, which are further refined into keyphrases through our BERT-refined keyphrase extraction (BRKE) method. Experimental results on a large dataset of USPTO patents demonstrate the effectiveness of the proposed BRKE. It achieves the highest F1-score of 52.97% when the top-10 keyphrases are retained, outperforming keyBERT, YAKE, and RAKE by 9.52%, 6.1%, and 2.35%, respectively. By enhancing the accuracy of patent keyphrase extraction, our contributions make patent analysis more efficient and accessible to both analysts and design engineers.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"84 ","pages":"Article 102435"},"PeriodicalIF":1.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}