Shihong Zhang , Ya Zhou , Liutao Chen , Yixin Huang , Zhe Wang
{"title":"Inferring building type using textual data and Natural Language Processing for urban building energy modelling","authors":"Shihong Zhang , Ya Zhou , Liutao Chen , Yixin Huang , Zhe Wang","doi":"10.1016/j.buildenv.2024.112428","DOIUrl":null,"url":null,"abstract":"<div><div>Building type is among the most important inputs for building energy model. However, the information of building type is always missing in urban scale building energy modeling. This paper presents a novel approach to infer building type from building name. First, we created the building name text dataset through the fusion of GIS spatial data. A rule-based method was developed to estimate building types using naming features. We then trained five machine learning classifiers, including four transformer models and one Multilayer Perceptron model, to predict building types. Finally, we leveraged the inferred building type information for building energy consumption simulation, addressing the crucial data scarcity issue in urban-scale building energy models. Experimental results indicated that our rule-based classification method achieved a precision of 84.3%. The RoBERTa model, the best-performing natural language processing (NLP) model, reached a precision of 91.6% with both Chinese and English names as NLP model inputs, showcasing a 1.3% enhancement compared to solely utilizing the Chinese dataset and a 1.8% improvement compared to solely utilizing the English dataset. This research proposes a useful framework to infer building type by leveraging the state-of-art NLP techniques, paving the way for more accurate and efficient urban-scale building energy modelling.</div></div>","PeriodicalId":9273,"journal":{"name":"Building and Environment","volume":"269 ","pages":"Article 112428"},"PeriodicalIF":7.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Building and Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360132324012708","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Building type is among the most important inputs for building energy model. However, the information of building type is always missing in urban scale building energy modeling. This paper presents a novel approach to infer building type from building name. First, we created the building name text dataset through the fusion of GIS spatial data. A rule-based method was developed to estimate building types using naming features. We then trained five machine learning classifiers, including four transformer models and one Multilayer Perceptron model, to predict building types. Finally, we leveraged the inferred building type information for building energy consumption simulation, addressing the crucial data scarcity issue in urban-scale building energy models. Experimental results indicated that our rule-based classification method achieved a precision of 84.3%. The RoBERTa model, the best-performing natural language processing (NLP) model, reached a precision of 91.6% with both Chinese and English names as NLP model inputs, showcasing a 1.3% enhancement compared to solely utilizing the Chinese dataset and a 1.8% improvement compared to solely utilizing the English dataset. This research proposes a useful framework to infer building type by leveraging the state-of-art NLP techniques, paving the way for more accurate and efficient urban-scale building energy modelling.
期刊介绍:
Building and Environment, an international journal, is dedicated to publishing original research papers, comprehensive review articles, editorials, and short communications in the fields of building science, urban physics, and human interaction with the indoor and outdoor built environment. The journal emphasizes innovative technologies and knowledge verified through measurement and analysis. It covers environmental performance across various spatial scales, from cities and communities to buildings and systems, fostering collaborative, multi-disciplinary research with broader significance.