Identifying business information through deep learning: analyzing the tender documents of an Internet-based logistics bidding platform

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2023-05-04 DOI:10.1108/dta-08-2022-0308
Yingwen Yu, Jing Ma
{"title":"Identifying business information through deep learning: analyzing the tender documents of an Internet-based logistics bidding platform","authors":"Yingwen Yu, Jing Ma","doi":"10.1108/dta-08-2022-0308","DOIUrl":null,"url":null,"abstract":"PurposeThe tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.Design/methodology/approachTo tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.FindingsThe proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.Originality/value(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-08-2022-0308","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

PurposeThe tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.Design/methodology/approachTo tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.FindingsThe proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.Originality/value(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过深度学习识别商业信息——分析基于互联网的物流投标平台的投标文件
目的招标文件是互联网物流招标平台的重要数据来源,包含了大量的细粒度数据,包括招标人信息、运输地点信息、运输物品信息等。然而,这一领域的自动化信息提取研究并不充分,使得提取过程既耗时又费力。特别是对于中国的物流招标实体,现有的命名实体识别(NER)解决方案大多不适合,因为它们涉及特定领域的术语,并且具有不同的语义特征。为了解决这个问题,本文提出了一种新的晶格长短期记忆(LSTM)模型,该模型结合了变量上下文特征表示和条件随机场(CRF)层,用于从物流投标文件中识别有价值的实体。与传统的词嵌入不同,该模型使用预训练的双向编码器表示作为输入来增强上下文特征表示。随后,利用Lattice-LSTM模型,有效地利用了字符和单词的信息,避免了错误分割。研究结果:提出的模型随后通过中国物流招标命名实体语料库进行验证。此外,研究结果表明,该模型在物流投标语料库中优于其他主流NER模型。该模型支持物流投标信息的自动提取,使物流公司能够感知不断变化的市场趋势,做出有远见的物流决策。原创性/价值(1)本文提出了一个实用的物流投标NER模型。通过将BERT应用于少量数据的下游任务中并对其进行微调,实验结果表明该模型比现有的其他模型具有更好的性能。据作者所知,这是第一次从中国物流招标文件中提取命名实体的研究。(2)构建了实用的物流实物标书语料库,开发了物流实物标书在线处理模型程序。该模型有助于物流企业将非结构化文档转化为结构化数据,进一步洞察不断变化的市场趋势,做出有远见的物流决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data Technologies and Applications
Data Technologies and Applications Social Sciences-Library and Information Sciences
CiteScore
3.80
自引率
6.20%
发文量
29
期刊介绍: Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies
期刊最新文献
Understanding customer behavior by mapping complaints to personality based on social media textual data A systematic review of the use of FHIR to support clinical research, public health and medical education Novel framework for learning performance prediction using pattern identification and deep learning A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1