欧盟TED数据库标书创建的混合神经网络和支持向量机模型

Sangramsing Kayte, Peter Schneider-Kamp
{"title":"欧盟TED数据库标书创建的混合神经网络和支持向量机模型","authors":"Sangramsing Kayte, Peter Schneider-Kamp","doi":"10.5220/0008362701390145","DOIUrl":null,"url":null,"abstract":"This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.","PeriodicalId":133533,"journal":{"name":"International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Mixed Neural Network and Support Vector Machine Model for Tender Creation in the European Union TED Database\",\"authors\":\"Sangramsing Kayte, Peter Schneider-Kamp\",\"doi\":\"10.5220/0008362701390145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.\",\"PeriodicalId\":133533,\"journal\":{\"name\":\"International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0008362701390145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0008362701390145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

本文提出了一种新的自动文本生成方法,并将欧盟(EU)招标电子日报(TED)文本文档分类为数据集的预定义技术类别。TED数据集提供了有关各自招标的信息,包括项目名称、标题、描述、合同类型、通用采购词汇(CPV)代码和附加CPV代码等特征。数据集是从欧洲公共采购网站的simap信息系统获得的,该系统由XML文件中描述的标书组成。对数据集进行了标记化、去除停止词、去除标点符号等预处理。我们实现了一个基于长短期记忆(LSTM)节点的神经机器学习模型,用于文本生成和随后的代码分类。文本生成意味着给定单行或仅两三个单词的标题,该模型生成整个句子的序列。生成标题后,模型预测该标题的主要适用CPV代码。LSTM模型的文本生成准确率达到97%,支持向量机(SVM)的代码分类准确率达到95%。这个实验是开发一个基于TED数据的系统的第一步,该系统能够自动生成和编码分类招标文件,简化创建和传播招标信息给TED和最终相关供应商的过程。该系统的开发和自动化将着眼于未来,了解当前正在进行的项目,并通过simap信息系统为欧洲公共采购招标组织提供基于其发布的标书。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Mixed Neural Network and Support Vector Machine Model for Tender Creation in the European Union TED Database
This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Knowledge Graph Approach for Exploratory Search in Research Institutions Multidimensional Fairness in Paper Recommendation Knowledge-based Service for African Traditional Herbal Medicine: A Hybrid Approach How Are Situation Picture, Situation Awareness, and Situation Understanding Discussed in Recent Scholarly Literature? Information Modeling of Rule-based Logistic Planning Processes Kanban Loop Planning Supported by a Workflow Engine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1