MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book “Die Maschinen-Industrie im Deutschen Reich”

IF 1 Q3 MULTIDISCIPLINARY SCIENCES Data in Brief Pub Date : 2025-02-01 DOI:10.1016/j.dib.2024.111238
Renat Shigapov , Thomas Schmidt , Jan Kamlah , Irene Schumm , Jochen Streb , Sibylle Lehmann-Hasemeyer
{"title":"MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book “Die Maschinen-Industrie im Deutschen Reich”","authors":"Renat Shigapov ,&nbsp;Thomas Schmidt ,&nbsp;Jan Kamlah ,&nbsp;Irene Schumm ,&nbsp;Jochen Streb ,&nbsp;Sibylle Lehmann-Hasemeyer","doi":"10.1016/j.dib.2024.111238","DOIUrl":null,"url":null,"abstract":"<div><div>The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.</div><div>The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.</div><div>This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111238"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742587/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924012009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The MaschinenBauIndustrie Knowledge Graph (MBI-KG) is a structured and semantically enriched dataset extracted from the 1937 publication “Die Maschinen-Industrie im Deutschen Reich” (The Machinery Industry in the German Reich), published by the “Wirtschaftsgruppe Maschinenbau” and edited by Herbert Patschan. This historical source offers data on German companies within the mechanical engineering industry during the pre-World War II era.
The book was digitized, and Optical Character Recognition (OCR) was applied to extract text. The unstructured extracted data was then structured and semantically enriched to enable data integration and reuse. The semantically enriched data was uploaded into an open-source knowledge-graph software. The resulting knowledge graph includes detailed information about companies, individuals, and administrative entities relevant to the German mechanical engineering industry. The data is accessible through various means, including a SPARQL endpoint, an API, advanced search functionalities, a reconciliation API, and bulk files. Each entity in the knowledge graph can be exported in multiple formats, such as CSV, RDF (ttl), JSON, and NDJSON, ensuring compatibility with diverse research tools and platforms.
This dataset can be reused in various research domains, including economic history, data science, and digital humanities. By providing machine-readable, structured data from a crucial historical period, the MBI-KG facilitates novel analyses and insights into the economic and industrial landscape of early 20th-century Germany. The dataset's interoperability with other data sources and its alignment with FAIR principles further enhance its value for interdisciplinary research and long-term preservation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MBI-KG:从1937年出版的《德国机械工业》一书中提取的结构化和关联的经济研究数据的知识图谱。
MaschinenBauIndustrie知识图谱(MBI-KG)是一个结构化和语义丰富的数据集,提取自1937年出版的“Die Maschinen-Industrie im Deutschen Reich”(德国帝国的机械工业),由“Wirtschaftsgruppe Maschinenbau”出版,由Herbert Patschan编辑。这个历史来源提供了二战前德国机械工程行业公司的数据。对该书进行数字化处理,并采用光学字符识别(OCR)技术提取文本。然后对提取的非结构化数据进行结构化和语义丰富,以支持数据集成和重用。语义丰富的数据被上传到一个开源的知识图谱软件中。由此产生的知识图谱包括与德国机械工程行业相关的公司、个人和行政实体的详细信息。数据可以通过各种方式访问,包括SPARQL端点、API、高级搜索功能、协调API和批量文件。知识图中的每个实体都可以以多种格式导出,例如CSV、RDF (ttl)、JSON和NDJSON,从而确保与各种研究工具和平台的兼容性。该数据集可以在各种研究领域中重用,包括经济史、数据科学和数字人文科学。通过提供一个关键历史时期的机器可读的结构化数据,MBI-KG促进了对20世纪初德国经济和工业景观的新颖分析和见解。该数据集与其他数据源的互操作性及其与FAIR原则的一致性进一步增强了其跨学科研究和长期保存的价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
期刊最新文献
Maternal health risk factors dataset: Clinical parameters and insights from rural Bangladesh Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpus CoAt-Set: Transformed coordinated attack dataset for collaborative intrusion detection simulation Data on hydrodynamic flow and aspiration mechanisms in a patient-specific pharyngolaryngeal model with variable epiglottis angles Dataset and analysis of automated and manual methods to differentiate wide QRS complex tachycardias
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1