{"title":"Leveraging NLP and web knowledge graphs to harmonize locations: A case study on US patent transactions","authors":"Grazia Sveva Ascione , Andrea Vezzulli","doi":"10.1016/j.wpi.2024.102320","DOIUrl":null,"url":null,"abstract":"<div><div>In the present study, we introduce a novel methodology for the harmonization and standardization of locations associated with patent transactions recorded at the USPTO from 2005 to 2022. Using natural language processing (NLP) techniques in conjunction with search engine-based web knowledge graphs, our method comprises four phases: data pre-processing, semantic clustering, exploitation of web-knowledge graphs, and API-driven harmonization. Initiating our analysis with a dataset of 63,838 unique locations, our methodology effectively reduces this number by more than 50 %. This approach exhibits an accuracy rate of approximately 92 %. The resulting geolocated dataset of companies’ patent transactions offers a valuable resource for fine-grained geographical analyses of the markets for technology; in particular, we provide examples of relevant economic insights which can be learned from looking at the geographical patterns of those transactions.</div></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"79 ","pages":"Article 102320"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219024000607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the present study, we introduce a novel methodology for the harmonization and standardization of locations associated with patent transactions recorded at the USPTO from 2005 to 2022. Using natural language processing (NLP) techniques in conjunction with search engine-based web knowledge graphs, our method comprises four phases: data pre-processing, semantic clustering, exploitation of web-knowledge graphs, and API-driven harmonization. Initiating our analysis with a dataset of 63,838 unique locations, our methodology effectively reduces this number by more than 50 %. This approach exhibits an accuracy rate of approximately 92 %. The resulting geolocated dataset of companies’ patent transactions offers a valuable resource for fine-grained geographical analyses of the markets for technology; in particular, we provide examples of relevant economic insights which can be learned from looking at the geographical patterns of those transactions.
期刊介绍:
The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.