Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer
{"title":"MBI-KG:从1937年出版的《德国机械工业》一书中提取的结构化和关联的经济研究数据的知识图谱。","authors":"Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer","doi":"10.1016/j.dib.2024.111238","DOIUrl":null,"url":null,"abstract":"<div><div>The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.</div><div>The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.</div><div>This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111238"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742587/pdf/","citationCount":"0","resultStr":"{\"title\":\"MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book “Die Maschinen-Industrie im Deutschen Reich”\",\"authors\":\"Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer\",\"doi\":\"10.1016/j.dib.2024.111238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.</div><div>The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.</div><div>This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"58 \",\"pages\":\"Article 111238\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742587/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340924012009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924012009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
MaschinenBauIndustrie知识图谱(MBI-KG)是一个结构化和语义丰富的数据集,提取自1937年出版的“Die Maschinen-Industrie im Deutschen Reich”(德国帝国的机械工业),由“Wirtschaftsgruppe Maschinenbau”出版,由Herbert Patschan编辑。这个历史来源提供了二战前德国机械工程行业公司的数据。对该书进行数字化处理,并采用光学字符识别(OCR)技术提取文本。然后对提取的非结构化数据进行结构化和语义丰富,以支持数据集成和重用。语义丰富的数据被上传到一个开源的知识图谱软件中。由此产生的知识图谱包括与德国机械工程行业相关的公司、个人和行政实体的详细信息。数据可以通过各种方式访问,包括SPARQL端点、API、高级搜索功能、协调API和批量文件。知识图中的每个实体都可以以多种格式导出,例如CSV、RDF (ttl)、JSON和NDJSON,从而确保与各种研究工具和平台的兼容性。该数据集可以在各种研究领域中重用,包括经济史、数据科学和数字人文科学。通过提供一个关键历史时期的机器可读的结构化数据,MBI-KG促进了对20世纪初德国经济和工业景观的新颖分析和见解。该数据集与其他数据源的互操作性及其与FAIR原则的一致性进一步增强了其跨学科研究和长期保存的价值。
MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book “Die Maschinen-Industrie im Deutschen Reich”
The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.
The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.
This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.