rmassidda @ DaDoEval: Document Dating Using Sentence Embeddings at EVALITA 2020

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020 Pub Date : 1900-01-01 DOI:10.4000/BOOKS.AACCADEMIA.7603

Riccardo Massidda

引用次数: 6

Abstract

This report describes an approach to solve the DaDoEval document dating subtasks for the EVALITA 2020 competition. The dating problem is tackled as a classification problem, where the significant length of the documents in the provided dataset is addressed by using sentence embeddings in a hierarchical architecture. Three different pre-trained models to generate sentence embeddings have been evaluated and compared: USE, LaBSE and SBERT. Other than sentence embeddings the classifier exploits a bag-of-entities representation of the document, generated using a pre-trained named entity recognizer. The final model is able to simultaneously produce the required date for each subtask.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

rmassidda @ DaDoEval:基于句子嵌入的文档年代测定在EVALITA 2020

本报告描述了一种解决EVALITA 2020竞赛中DaDoEval文档日期子任务的方法。日期问题作为分类问题来解决，其中提供的数据集中文档的有效长度通过在分层架构中使用句子嵌入来解决。三种不同的预训练模型生成句子嵌入进行了评估和比较:使用，LaBSE和SBERT。除了句子嵌入之外，分类器利用文档的实体袋表示，使用预训练的命名实体识别器生成。最终模型能够同时为每个子任务生成所需的日期。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

自引率

0.00%

发文量