Semantic Web最新文献_第9页

MTab4D: Semantic annotation of tabular data with DBpedia MTab4D:用DBpedia对表格数据进行语义注释

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-08-25 DOI: 10.3233/sw-223098

Phuc Nguyen, N. Kertkeidkachorn, R. Ichise, Hideaki Takeda

Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia.

表格数据的语义标注是将表格元素与知识图进行匹配的过程。因此，可以使用知识图概念解释或推断表内容，从而使它们在下游应用程序(如数据分析和管理)中非常有用。然而，由于表格数据描述不足、异构模式和词汇表问题，语义注释任务具有挑战性。本文提出了一个名为MTab4D的表格数据自动语义注释系统，该系统使用DBpedia生成注释，实现了三个注释任务:1)将表格单元格与实体匹配，2)将列与实体类型匹配，3)将列对与属性匹配。特别是，我们提出了一个注释管道，它结合了来自不同表元素的多个匹配信号，以解决模式异构、数据模糊和噪声问题。此外，本文还提供了关于使用知识图对语义注释进行基准测试的深刻分析和额外资源。在表格数据到知识图匹配语义Web挑战赛(SemTab 2019)的原始和改编数据集上的实验结果表明，我们的系统在三个标注任务上取得了令人印象深刻的性能。MTab4D的存储库可在https://github.com/phucty/mtab4dbpedia公开获取。

{"title":"MTab4D: Semantic annotation of tabular data with DBpedia","authors":"Phuc Nguyen, N. Kertkeidkachorn, R. Ichise, Hideaki Takeda","doi":"10.3233/sw-223098","DOIUrl":"https://doi.org/10.3233/sw-223098","url":null,"abstract":"Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"16 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79205263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Morph-KGC: Scalable knowledge graph materialization with mapping partitions morphi - kgc:具有映射分区的可伸缩知识图谱物化

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-08-25 DOI: 10.3233/sw-223135

Julián Arenas-Guerrero, David Chaves-Fraga, Jhon Toledo, María S. Pérez, Óscar Corcho

Knowledge graphs are often constructed from heterogeneous data sources, using declarative rules that map them to a target ontology and materializing them into RDF. When these data sources are large, the materialization of the entire knowledge graph may be computationally expensive and not suitable for those cases where a rapid materialization is required. In this work, we propose an approach to overcome this limitation, based on the novel concept of mapping partitions. Mapping partitions are defined as groups of mapping rules that generate disjoint subsets of the knowledge graph. Each of these groups can be processed separately, reducing the total amount of memory and execution time required by the materialization process. We have included this optimization in our materialization engine Morph-KGC, and we have evaluated it over three different benchmarks. Our experimental results show that, compared with state-of-the-art techniques, the use of mapping partitions in Morph-KGC presents the following advantages: (i) it decreases significantly the time required for materialization, (ii) it reduces the maximum peak of memory used, and (iii) it scales to data sizes that other engines are not capable of processing currently.

知识图通常是从异构数据源构建的，使用声明性规则将它们映射到目标本体，并将它们物化为RDF。当这些数据源很大时，整个知识图的实体化可能在计算上很昂贵，不适合那些需要快速实体化的情况。在这项工作中，我们提出了一种基于映射分区的新概念来克服这一限制的方法。映射分区被定义为生成知识图的不相交子集的映射规则组。这些组中的每一个都可以单独处理，从而减少了物化过程所需的内存总量和执行时间。我们在物化引擎morphi - kgc中包含了这个优化，并在三个不同的基准测试中对其进行了评估。我们的实验结果表明，与最先进的技术相比，在morphi - kgc中使用映射分区具有以下优势:(i)它显着减少了物化所需的时间，(ii)它减少了所使用的内存的最大峰值，以及(iii)它扩展到其他引擎目前无法处理的数据大小。

{"title":"Morph-KGC: Scalable knowledge graph materialization with mapping partitions","authors":"Julián Arenas-Guerrero, David Chaves-Fraga, Jhon Toledo, María S. Pérez, Óscar Corcho","doi":"10.3233/sw-223135","DOIUrl":"https://doi.org/10.3233/sw-223135","url":null,"abstract":"Knowledge graphs are often constructed from heterogeneous data sources, using declarative rules that map them to a target ontology and materializing them into RDF. When these data sources are large, the materialization of the entire knowledge graph may be computationally expensive and not suitable for those cases where a rapid materialization is required. In this work, we propose an approach to overcome this limitation, based on the novel concept of mapping partitions. Mapping partitions are defined as groups of mapping rules that generate disjoint subsets of the knowledge graph. Each of these groups can be processed separately, reducing the total amount of memory and execution time required by the materialization process. We have included this optimization in our materialization engine Morph-KGC, and we have evaluated it over three different benchmarks. Our experimental results show that, compared with state-of-the-art techniques, the use of mapping partitions in Morph-KGC presents the following advantages: (i) it decreases significantly the time required for materialization, (ii) it reduces the maximum peak of memory used, and (iii) it scales to data sizes that other engines are not capable of processing currently.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"20 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84427491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Transdisciplinary approach to archaeological investigations in a Semantic Web perspective 语义网视角下考古调查的跨学科方法

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-08-22 DOI: 10.3233/sw-223016

V. Lombardo, T. Karatas, M. Gulmini, L. Guidorzi, D. Angelici

In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called “archaeometers”). A number of diverse scientific disciplines collaborate to get an objective account of the archaeological records. A large amount of digital data support the whole process, and there is a great value in keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years, a number of representation models have been developed to account for the recording of the archaeological process in data bases. Lately, some semantic models, compliant with the CRMarchaeo reference model, have been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to the archaeological and archaeometric analyses and interpretations, also connected to the recording catalogues. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel classes and properties to merge the two worlds in a joint representation. The ontology is in use in “Beyond Archaeology”, a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometry, interlinked through a semantic model of processes and objects.

近年来，由于考古学家与不同学科的科学家(称为“考古学家”)之间的互动日趋成熟，考古学研究的跨学科性大大增加。许多不同的科学学科合作以获得考古记录的客观描述。大量的数字数据支持整个过程，并且保持信息和知识的一致性具有很大的价值，因为每个介入的学科都做出了贡献。这些年来，已经发展了若干表示模式，以便在数据库中记录考古过程。最近，一些符合CRMarchaeo参考模型的语义模型被开发出来，以解释有关考古发掘和相关发现的制度形式与正式知识之间的联系。相反，考古过程还没有在语义网社区中得到解决，只有一个上层参考模型，称为CRMsci，说明了一般科学调查的代表性。本文提出了一个模块化的计算本体，用于与考古和考古分析和解释相关的所有事实的互连表示，也连接到记录目录。计算本体与CIDOC-CRM参考模型CRMarchaeo和CRMsci兼容，并引入了许多新的类和属性，将两个世界合并为一个联合表示。本体论被用于“超越考古学”，这是一个方法论项目，旨在通过过程和对象的语义模型建立考古学和考古计量学的跨学科方法。

{"title":"Transdisciplinary approach to archaeological investigations in a Semantic Web perspective","authors":"V. Lombardo, T. Karatas, M. Gulmini, L. Guidorzi, D. Angelici","doi":"10.3233/sw-223016","DOIUrl":"https://doi.org/10.3233/sw-223016","url":null,"abstract":"In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called “archaeometers”). A number of diverse scientific disciplines collaborate to get an objective account of the archaeological records. A large amount of digital data support the whole process, and there is a great value in keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years, a number of representation models have been developed to account for the recording of the archaeological process in data bases. Lately, some semantic models, compliant with the CRMarchaeo reference model, have been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to the archaeological and archaeometric analyses and interpretations, also connected to the recording catalogues. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel classes and properties to merge the two worlds in a joint representation. The ontology is in use in “Beyond Archaeology”, a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometry, interlinked through a semantic model of processes and objects.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"27 1","pages":"361-383"},"PeriodicalIF":3.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85617298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Semantic models and services for conservation and restoration of cultural heritage: A comprehensive survey 文化遗产保护与修复的语义模型与服务综述

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-08-22 DOI: 10.3233/sw-223105

Efthymia Moraitou, Yannis Christodoulou, G. Caridakis

Over the last decade, the Cultural Heritage (CH) domain has gradually adopted Semantic Web (SW) technologies for organizing information and for tackling interoperability issues. Several semantic models have been proposed which accommodate essential aspects of information management: retrieval, integration, reuse and sharing. In this context, the CH subdomain of Conservation and Restoration (CnR) exhibits an increasing interest in SW technologies, in an attempt to effectively handle the highly heterogeneous and often secluded CnR information. This paper investigates semantic models relevant to the CnR knowledge domain. The scope, development methodology and coverage of CnR aspects are described and discussed. Furthermore, the evaluation, deployment and current exploitation of each model are examined, with focus on the types and variety of services provided to support the CnR professional. Through this study, the following research questions are investigated: To what extent the various aspects of CnR are covered by existing CnR models? To what extent existing CnR models incorporate models of the broader CH domain and of relevant disciplines (e.g., Chemistry)? In what ways and to what extent services built upon the reviewed models facilitate CnR professionals in their various tasks? Finally, based on the findings, fields of interest that merit further investigation are suggested.

在过去的十年中，文化遗产(CH)领域逐渐采用语义网(SW)技术来组织信息和解决互操作性问题。提出了几种语义模型，以适应信息管理的基本方面:检索、集成、重用和共享。在这种背景下，保护与恢复(CnR)的CH子领域对软件技术越来越感兴趣，试图有效地处理高度异构且通常隐蔽的CnR信息。研究了与CnR知识领域相关的语义模型。描述和讨论了CnR方面的范围、开发方法和覆盖范围。此外，对每个模型的评估、部署和当前利用进行了检查，重点关注为支持CnR专业人员提供的服务的类型和种类。通过本研究，探讨了以下研究问题:现有CnR模型在多大程度上涵盖了CnR的各个方面?现有CnR模型在多大程度上纳入了更广泛的CH领域和相关学科(例如化学)的模型?在何种方式和何种程度上，建立在审查模型之上的服务促进了CnR专业人员的各种任务?最后，根据研究结果，提出了值得进一步研究的领域。

{"title":"Semantic models and services for conservation and restoration of cultural heritage: A comprehensive survey","authors":"Efthymia Moraitou, Yannis Christodoulou, G. Caridakis","doi":"10.3233/sw-223105","DOIUrl":"https://doi.org/10.3233/sw-223105","url":null,"abstract":"Over the last decade, the Cultural Heritage (CH) domain has gradually adopted Semantic Web (SW) technologies for organizing information and for tackling interoperability issues. Several semantic models have been proposed which accommodate essential aspects of information management: retrieval, integration, reuse and sharing. In this context, the CH subdomain of Conservation and Restoration (CnR) exhibits an increasing interest in SW technologies, in an attempt to effectively handle the highly heterogeneous and often secluded CnR information. This paper investigates semantic models relevant to the CnR knowledge domain. The scope, development methodology and coverage of CnR aspects are described and discussed. Furthermore, the evaluation, deployment and current exploitation of each model are examined, with focus on the types and variety of services provided to support the CnR professional. Through this study, the following research questions are investigated: To what extent the various aspects of CnR are covered by existing CnR models? To what extent existing CnR models incorporate models of the broader CH domain and of relevant disciplines (e.g., Chemistry)? In what ways and to what extent services built upon the reviewed models facilitate CnR professionals in their various tasks? Finally, based on the findings, fields of interest that merit further investigation are suggested.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"108 1","pages":"261-291"},"PeriodicalIF":3.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74660581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Generation of training data for named entity recognition of artworks 艺术品命名实体识别训练数据的生成

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-08-08 DOI: 10.3233/sw-223177

Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel

As machine learning techniques are being increasingly employed for text processing tasks, the need for training data has become a major bottleneck for their application. Manual generation of large scale training datasets tailored to each task is a time consuming and expensive process, which necessitates their automated generation. In this work, we turn our attention towards creation of training datasets for named entity recognition (NER) in the context of the cultural heritage domain. NER plays an important role in many natural language processing systems. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as digitized art archives, the recognition of fine-grained entity types such as titles of artworks is of high importance. Current state of the art tools are unable to adequately identify artwork titles due to unavailability of relevant training datasets. We analyse the particular difficulties presented by this domain and motivate the need for quality annotations to train machine learning models for identification of artwork titles. We present a framework with heuristic based approach to create high-quality training data by leveraging existing cultural heritage resources from knowledge bases such as Wikidata. Experimental evaluation shows significant improvement over the baseline for NER performance for artwork titles when models are trained on the dataset generated using our framework.

随着机器学习技术越来越多地用于文本处理任务，对训练数据的需求已成为其应用的主要瓶颈。人工生成适合每个任务的大规模训练数据集是一个耗时且昂贵的过程，因此需要自动生成。在这项工作中，我们将注意力转向文化遗产领域背景下命名实体识别(NER)的训练数据集的创建。NER在许多自然语言处理系统中起着重要的作用。大多数NER系统通常仅限于几个常见的命名实体类型，如人员、位置和组织。然而，对于数字化艺术档案等文化遗产资源而言，艺术品名称等细粒度实体类型的识别非常重要。由于缺乏相关的训练数据集，目前最先进的工具无法充分识别艺术品标题。我们分析了该领域提出的特殊困难，并激发了对高质量注释的需求，以训练机器学习模型来识别艺术品标题。我们提出了一个基于启发式方法的框架，通过利用来自Wikidata等知识库的现有文化遗产资源来创建高质量的训练数据。实验评估显示，当模型在使用我们的框架生成的数据集上进行训练时，艺术品标题的NER性能比基线有了显著改善。

{"title":"Generation of training data for named entity recognition of artworks","authors":"Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel","doi":"10.3233/sw-223177","DOIUrl":"https://doi.org/10.3233/sw-223177","url":null,"abstract":"As machine learning techniques are being increasingly employed for text processing tasks, the need for training data has become a major bottleneck for their application. Manual generation of large scale training datasets tailored to each task is a time consuming and expensive process, which necessitates their automated generation. In this work, we turn our attention towards creation of training datasets for named entity recognition (NER) in the context of the cultural heritage domain. NER plays an important role in many natural language processing systems. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as digitized art archives, the recognition of fine-grained entity types such as titles of artworks is of high importance. Current state of the art tools are unable to adequately identify artwork titles due to unavailability of relevant training datasets. We analyse the particular difficulties presented by this domain and motivate the need for quality annotations to train machine learning models for identification of artwork titles. We present a framework with heuristic based approach to create high-quality training data by leveraging existing cultural heritage resources from knowledge bases such as Wikidata. Experimental evaluation shows significant improvement over the baseline for NER performance for artwork titles when models are trained on the dataset generated using our framework.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"21 1","pages":"239-260"},"PeriodicalIF":3.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84507185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Editorial of the Special issue on Cultural heritage and semantic web 《文化遗产与语义网》特刊编辑

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-07-28 DOI: 10.3233/sw-223187

Mehwish Alam, Victor de Boer, E. Daga, Marieke Van Erp, E. Hyvönen, Albert Meroño-Peñuela

Guillermo Vega-Gorgojo, Eduardo Gómez-Sánchez, Juan I. Asensio-Pérez, Sergio Serrano-Iglesias, and Ale-jandra Martínez-Monés. The paper presents Casual Learn , an application that proposes ubiquitous learning tasks about Cultural Heritage. Casual Learn leverages a dataset of 10,000 contextualized learning tasks that were semi-automatically generated out of open data from the Web. Casual Learn offers these tasks to learners according to their physical location. For example, it may suggest describing the characteristics of the Gothic style when passing by a Gothic Cathedral. Additionally, Casual Learn has an interactive mode where learners can geo-search available tasks.

Guillermo Vega-Gorgojo, Eduardo Gómez-Sánchez, Juan I. asensio - psamurez, Sergio Serrano-Iglesias和Ale-jandra Martínez-Monés。本文介绍了Casual Learn，一个提出关于文化遗产的泛在学习任务的应用程序。Casual Learn利用了一个包含10,000个上下文化学习任务的数据集，这些任务是从Web上的开放数据中半自动生成的。Casual Learn根据学习者的实际位置为他们提供这些任务。例如，当经过一座哥特式大教堂时，它可能会建议描述哥特式风格的特征。此外，Casual Learn有一个互动模式，学习者可以在其中搜索可用的任务。

引用次数: 2

Linking discourse-level information and the induction of bilingual discourse connective lexicons 语篇级信息的衔接与双语语篇连接词汇的归纳

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-06-20 DOI: 10.3233/sw-223011

Sibel Özer, Murathan Kurfali, Deniz Zeyrek, Amália Mendes, Giedre Valunaite Oleskeviciene

The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consists of independently annotated six TED talks in seven different languages. It is shown that the linguistic labels over the relations annotated in the texts of these languages can be automatically linked with English with high accuracy, as verified against the relations of three diverse languages semi-automatically linked with relations over English texts. The resulting corpus has a great potential to reveal the divergences in local discourse relations, as well as leading to new resources, as exemplified by the induction of bilingual discourse connective lexicons.

进行全面的跨语言语篇分析的最大障碍是多语言资源的缺乏。现有的资源绝大多数是单语的，迫使研究人员通过容易出错的自动手段推断目标语言的语篇级信息。本文旨在通过链接TED多语言话语库的注释关系，更直接地了解话语结构的跨语言变化。该库由7种不同语言的6个TED演讲独立注释组成。对比三种不同语言的关系与英语文本上的关系的半自动链接，结果表明，这些语言文本中标注的关系上的语言标签可以高精度地自动链接到英语。由此产生的语料库在揭示地方话语关系的差异和挖掘新资源方面具有很大的潜力，双语话语连接词汇的归纳就是一个很好的例子。

引用次数: 3

When linguistics meets web technologies. Recent advances in modelling linguistic linked data 当语言学遇到网络技术。语言关联数据建模的最新进展

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-06-15 DOI: 10.3233/sw-222859

Anas Fahad Khan, C. Chiarcos, T. Declerck, Daniela Gîfu, Elena González-Blanco García, J. Gracia, Maxim Ionov, Penny Labropoulou, Francesco Mambrini, John P. McCrae, Émilie Pagé-Perron, M. Passarotti, Salvador Ros Muñoz, Ciprian-Octavian Truică

This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.

本文提供了用于创建语言关联数据(LLD)的模型和词汇表的全面和最新调查，重点关注该领域的最新发展，并建立和补充了涵盖类似领域的先前工作。本文首先概述了对关联数据模型和词汇表产生重大影响的一些最新趋势。接下来，我们对不同类别的LLD资源的现有词汇表和模型进行了总体概述。之后，我们将介绍一些社区标准和项目的最新进展，包括对OntoLex-Lemon模型的最新工作的描述，对语言注释和LLD的最新项目的调查，以及对LLD元数据词汇META-SHARE和lime的讨论。在本文的下一部分，我们将重点关注项目对LLD模型和词汇表的影响，从对相关项目的总体调查开始，然后用个别部分介绍一些最近的项目及其对LLD词汇表和模型的影响。最后，在结论部分，我们展望了LLD模型和词汇表未来面临的一些挑战。论文的附录是对OntoLex-Lemon模型的简要介绍。

{"title":"When linguistics meets web technologies. Recent advances in modelling linguistic linked data","authors":"Anas Fahad Khan, C. Chiarcos, T. Declerck, Daniela Gîfu, Elena González-Blanco García, J. Gracia, Maxim Ionov, Penny Labropoulou, Francesco Mambrini, John P. McCrae, Émilie Pagé-Perron, M. Passarotti, Salvador Ros Muñoz, Ciprian-Octavian Truică","doi":"10.3233/sw-222859","DOIUrl":"https://doi.org/10.3233/sw-222859","url":null,"abstract":"This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"1 1","pages":"987-1050"},"PeriodicalIF":3.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85682881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Instance level analysis on linked open data connectivity for cultural heritage entity linking and data integration 面向文物实体链接与数据集成的链接式开放数据连接实例级分析

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-06-09 DOI: 10.3233/sw-223026

Go Sugimoto

In cultural heritage, many projects execute Named Entity Linking (NEL) through global Linked Open Data (LOD) references in order to identify and disambiguate entities in their local datasets. It allows users to obtain extra information and contextualise the data with it. Thus, the aggregation and integration of heterogeneous LOD are expected. However, such development is still limited partly due to data quality issues. In addition, analysis on the LOD quality has not sufficiently been conducted for cultural heritage. Moreover, most research on data quality concentrates on ontology and corpus level observations. This paper examines the quality of the eleven major LOD sources used for NEL in cultural heritage with an emphasis on instance-level connectivity and graph traversals. Standardised linking properties are inspected for 100 instances/entities in order to create traversal route maps. Other properties are also assessed for quantity and quality. The outcomes suggest that the LOD is not fully interconnected and centrally condensed; the quantity and quality are unbalanced. Therefore, they cast doubt on the possibility of automatically identifying, accessing, and integrating known and unknown datasets. This implies the need for LOD improvement, as well as the NEL strategies to maximise the data integration.

在文化遗产中，许多项目通过全球关联开放数据(LOD)引用执行命名实体链接(NEL)，以识别和消除本地数据集中的实体歧义。它允许用户获得额外的信息，并将数据与它联系起来。因此，期望异构LOD的聚集和集成。然而，这种发展仍然受到限制，部分原因是数据质量问题。此外，对文化遗产LOD质量的分析还不够充分。此外，大多数关于数据质量的研究都集中在本体和语料库层面的观察上。本文研究了文化遗产中用于NEL的11个主要LOD源的质量，重点是实例级连接和图遍历。为了创建遍历路由图，要检查100个实例/实体的标准化链接属性。其他属性也会根据数量和质量进行评估。结果表明，LOD没有完全互连和集中凝聚;数量和质量不平衡。因此，他们对自动识别、访问和整合已知和未知数据集的可能性表示怀疑。这意味着需要改进LOD，以及最大限度地提高数据集成的NEL策略。

{"title":"Instance level analysis on linked open data connectivity for cultural heritage entity linking and data integration","authors":"Go Sugimoto","doi":"10.3233/sw-223026","DOIUrl":"https://doi.org/10.3233/sw-223026","url":null,"abstract":"In cultural heritage, many projects execute Named Entity Linking (NEL) through global Linked Open Data (LOD) references in order to identify and disambiguate entities in their local datasets. It allows users to obtain extra information and contextualise the data with it. Thus, the aggregation and integration of heterogeneous LOD are expected. However, such development is still limited partly due to data quality issues. In addition, analysis on the LOD quality has not sufficiently been conducted for cultural heritage. Moreover, most research on data quality concentrates on ontology and corpus level observations. This paper examines the quality of the eleven major LOD sources used for NEL in cultural heritage with an emphasis on instance-level connectivity and graph traversals. Standardised linking properties are inspected for 100 instances/entities in order to create traversal route maps. Other properties are also assessed for quantity and quality. The outcomes suggest that the LOD is not fully interconnected and centrally condensed; the quantity and quality are unbalanced. Therefore, they cast doubt on the possibility of automatically identifying, accessing, and integrating known and unknown datasets. This implies the need for LOD improvement, as well as the NEL strategies to maximise the data integration.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"10 1","pages":"55-100"},"PeriodicalIF":3.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81930212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of ontologies and policy languages to represent information flows in GDPR GDPR中表示信息流的本体和策略语言分析

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2022-06-07 DOI: 10.3233/sw-223009

Beatriz Esteves, V. Rodríguez-Doncel

This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities.

本文调查了可用于表示GDPR权利和义务中引用的信息项的现有词汇、本体和政策语言，例如“数据泄露通知”、“控制器身份”或“DPIA”。从不同利益相关者之间的信息流的角度分析了GDPR中的权利和义务，并描述了GDPR中提到的57种不同信息项目的完整集合。针对此信息项列表，研究了13种与隐私相关的策略语言和9种数据保护词汇表和本体。ODRL和LegalRuleML如果与DPV和GDPRtEXT相辅相成，就可以积极响应更多已定义的比较标准，因为57个信息项目中的39个可以建模。提供了在线补充材料，包括一个简单的搜索应用程序和已识别实体的分类法。

引用次数: 10