SpannerLib：在命令式工作流中嵌入声明式信息提取

arXiv - CS - Databases Pub Date : 2024-09-03 DOI:arxiv-2409.01736

Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld

{"title":"SpannerLib：在命令式工作流中嵌入声明式信息提取","authors":"Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld","doi":"arxiv-2409.01736","DOIUrl":null,"url":null,"abstract":"Document spanners have been proposed as a formal framework for declarative\nInformation Extraction (IE) from text, following IE products from the industry\nand academia. Over the past decade, the framework has been studied thoroughly\nin terms of expressive power, complexity, and the ability to naturally combine\ntext analysis with relational querying. This demonstration presents SpannerLib\na library for embedding document spanners in Python code. SpannerLib\nfacilitates the development of IE programs by providing an implementation of\nSpannerlog (Datalog-based documentspanners) that interacts with the Python code\nin two directions: rules can be embedded inside Python, and they can invoke\ncustom Python code (e.g., calls to ML-based NLP models) via user-defined\nfunctions. The demonstration scenarios showcase IE programs, with increasing\nlevels of complexity, within Jupyter Notebook.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow\",\"authors\":\"Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld\",\"doi\":\"arxiv-2409.01736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document spanners have been proposed as a formal framework for declarative\\nInformation Extraction (IE) from text, following IE products from the industry\\nand academia. Over the past decade, the framework has been studied thoroughly\\nin terms of expressive power, complexity, and the ability to naturally combine\\ntext analysis with relational querying. This demonstration presents SpannerLib\\na library for embedding document spanners in Python code. SpannerLib\\nfacilitates the development of IE programs by providing an implementation of\\nSpannerlog (Datalog-based documentspanners) that interacts with the Python code\\nin two directions: rules can be embedded inside Python, and they can invoke\\ncustom Python code (e.g., calls to ML-based NLP models) via user-defined\\nfunctions. The demonstration scenarios showcase IE programs, with increasing\\nlevels of complexity, within Jupyter Notebook.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

继工业界和学术界的信息提取产品之后，人们又提出了从文本中进行声明式信息提取（IE）的正式框架--文档生成器（Document Spanners）。在过去十年中，该框架在表达能力、复杂性以及将文本分析与关系查询自然结合的能力等方面都得到了深入研究。本演示介绍了 SpannerLiba 库，用于在 Python 代码中嵌入文档生成器。SpannerLib 通过提供一个与 Python 代码双向交互的 Spannerlog（基于 Datalog 的文档生成器）实现，促进了 IE 程序的开发：规则可以嵌入到 Python 中，并且可以通过用户自定义函数调用自定义 Python 代码（例如，调用基于 ML 的 NLP 模型）。演示场景展示了 Jupyter Notebook 中复杂程度不断提高的 IE 程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow

Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes