{"title":"Integrating big data with KNIME as an alternative without programming code: an application to the PATSTAT patent database","authors":"Fernando H. Taques, Coro Chasco, Flávio H. Taques","doi":"10.1007/s10109-024-00445-0","DOIUrl":null,"url":null,"abstract":"<p>Accessing massive datasets can be challenging for users unfamiliar with programming codes. Combining Konstanz Information Miner (KNIME) and MySQL tools on standard configuration equipment allows for addressing this issue. This research proposal aims to present a methodology that describes the necessary configuration steps in both tools and the required manipulation in KNIME to transmit the information to the MySQL environment for further processing in a database management system (DBMS). In addition, we propose a procedure so that the use of this point-and-click software in research work can gain in reproducibility and, therefore, in credibility in the scientific community. To achieve this, we will use a big database regarding patent applications as a reference, the PATSTAT Global 2023, provided by the European Patent Office (EPO). As well known, patent data can be a valuable source for understanding innovation dynamics and technological trends, whether for studies on companies, sectors, nations or even regions, at aggregated and disaggregated levels.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-024-00445-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
Accessing massive datasets can be challenging for users unfamiliar with programming codes. Combining Konstanz Information Miner (KNIME) and MySQL tools on standard configuration equipment allows for addressing this issue. This research proposal aims to present a methodology that describes the necessary configuration steps in both tools and the required manipulation in KNIME to transmit the information to the MySQL environment for further processing in a database management system (DBMS). In addition, we propose a procedure so that the use of this point-and-click software in research work can gain in reproducibility and, therefore, in credibility in the scientific community. To achieve this, we will use a big database regarding patent applications as a reference, the PATSTAT Global 2023, provided by the European Patent Office (EPO). As well known, patent data can be a valuable source for understanding innovation dynamics and technological trends, whether for studies on companies, sectors, nations or even regions, at aggregated and disaggregated levels.
对于不熟悉编程代码的用户来说,访问海量数据集是一项挑战。在标准配置设备上结合康斯坦茨信息挖掘器(KNIME)和 MySQL 工具可以解决这个问题。本研究提案旨在提出一种方法,描述两种工具的必要配置步骤,以及在 KNIME 中传输信息到 MySQL 环境以便在数据库管理系统(DBMS)中进一步处理所需的操作。此外,我们还提出了一个程序,以便在研究工作中使用这种点选式软件可以提高可重复性,从而提高科学界的可信度。为此,我们将以欧洲专利局(EPO)提供的大型专利申请数据库 PATSTAT Global 2023 作为参考。众所周知,专利数据是了解创新动态和技术趋势的重要来源,无论是对公司、行业、国家甚至地区的研究,都可以从总体或分类的层面进行分析。