Chenhong Zhang , Xiaoming Lei , Ye Xia , Limin Sun
{"title":"通过混合信息提取和大型语言模型自动构建桥梁检测数据库","authors":"Chenhong Zhang , Xiaoming Lei , Ye Xia , Limin Sun","doi":"10.1016/j.dibe.2024.100549","DOIUrl":null,"url":null,"abstract":"<div><div>Regular bridge inspections generate extensive reports that, while critical for maintenance, often remain underutilized due to their unstructured format. Traditional information extraction methods depend on intricate labeling systems that commonly require time-consuming and labor-intensive labeling. This paper presents a novel bridge inspection database construction method leveraging LLM-assisted information extraction. First, we introduce the pseudo-labelling method using a closed-source LLM to generate high-quality data. Then we propose the hybrid extraction pipeline to extract relevant information segments and process them by a generation-based IE model, fine-tuned on pseudo-labeled data. Finally, the extracted data is used to construct the bridge inspection database. The proposed method, validated with real-world data, not only demonstrates higher extraction precision than the closed-source LLM used for pseudo-labeling but also outperforms traditional methods in both data preparation time and extraction accuracy. This approach provides a scalable solution for more proactive and data-driven bridge maintenance strategies.</div></div>","PeriodicalId":34137,"journal":{"name":"Developments in the Built Environment","volume":"20 ","pages":"Article 100549"},"PeriodicalIF":6.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic bridge inspection database construction through hybrid information extraction and large language models\",\"authors\":\"Chenhong Zhang , Xiaoming Lei , Ye Xia , Limin Sun\",\"doi\":\"10.1016/j.dibe.2024.100549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Regular bridge inspections generate extensive reports that, while critical for maintenance, often remain underutilized due to their unstructured format. Traditional information extraction methods depend on intricate labeling systems that commonly require time-consuming and labor-intensive labeling. This paper presents a novel bridge inspection database construction method leveraging LLM-assisted information extraction. First, we introduce the pseudo-labelling method using a closed-source LLM to generate high-quality data. Then we propose the hybrid extraction pipeline to extract relevant information segments and process them by a generation-based IE model, fine-tuned on pseudo-labeled data. Finally, the extracted data is used to construct the bridge inspection database. The proposed method, validated with real-world data, not only demonstrates higher extraction precision than the closed-source LLM used for pseudo-labeling but also outperforms traditional methods in both data preparation time and extraction accuracy. This approach provides a scalable solution for more proactive and data-driven bridge maintenance strategies.</div></div>\",\"PeriodicalId\":34137,\"journal\":{\"name\":\"Developments in the Built Environment\",\"volume\":\"20 \",\"pages\":\"Article 100549\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Developments in the Built Environment\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666165924002308\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Developments in the Built Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666165924002308","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
Automatic bridge inspection database construction through hybrid information extraction and large language models
Regular bridge inspections generate extensive reports that, while critical for maintenance, often remain underutilized due to their unstructured format. Traditional information extraction methods depend on intricate labeling systems that commonly require time-consuming and labor-intensive labeling. This paper presents a novel bridge inspection database construction method leveraging LLM-assisted information extraction. First, we introduce the pseudo-labelling method using a closed-source LLM to generate high-quality data. Then we propose the hybrid extraction pipeline to extract relevant information segments and process them by a generation-based IE model, fine-tuned on pseudo-labeled data. Finally, the extracted data is used to construct the bridge inspection database. The proposed method, validated with real-world data, not only demonstrates higher extraction precision than the closed-source LLM used for pseudo-labeling but also outperforms traditional methods in both data preparation time and extraction accuracy. This approach provides a scalable solution for more proactive and data-driven bridge maintenance strategies.
期刊介绍:
Developments in the Built Environment (DIBE) is a recently established peer-reviewed gold open access journal, ensuring that all accepted articles are permanently and freely accessible. Focused on civil engineering and the built environment, DIBE publishes original papers and short communications. Encompassing topics such as construction materials and building sustainability, the journal adopts a holistic approach with the aim of benefiting the community.