Phuc Nguyen, N. Kertkeidkachorn, R. Ichise, Hideaki Takeda
Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia.
{"title":"MTab4D: Semantic annotation of tabular data with DBpedia","authors":"Phuc Nguyen, N. Kertkeidkachorn, R. Ichise, Hideaki Takeda","doi":"10.3233/sw-223098","DOIUrl":"https://doi.org/10.3233/sw-223098","url":null,"abstract":"Semantic annotation of tabular data is the process of matching table elements with knowledge graphs. As a result, the table contents could be interpreted or inferred using knowledge graph concepts, enabling them to be useful in downstream applications such as data analytics and management. Nevertheless, semantic annotation tasks are challenging due to insufficient tabular data descriptions, heterogeneous schema, and vocabulary issues. This paper presents an automatic semantic annotation system for tabular data, called MTab4D, to generate annotations with DBpedia in three annotation tasks: 1) matching table cells to entities, 2) matching columns to entity types, and 3) matching pairs of columns to properties. In particular, we propose an annotation pipeline that combines multiple matching signals from different table elements to address schema heterogeneity, data ambiguity, and noisiness. Additionally, this paper provides insightful analysis and extra resources on benchmarking semantic annotation with knowledge graphs. Experimental results on the original and adapted datasets of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019) show that our system achieves an impressive performance for the three annotation tasks. MTab4D’s repository is publicly available at https://github.com/phucty/mtab4dbpedia.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"16 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79205263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julián Arenas-Guerrero, David Chaves-Fraga, Jhon Toledo, María S. Pérez, Óscar Corcho
Knowledge graphs are often constructed from heterogeneous data sources, using declarative rules that map them to a target ontology and materializing them into RDF. When these data sources are large, the materialization of the entire knowledge graph may be computationally expensive and not suitable for those cases where a rapid materialization is required. In this work, we propose an approach to overcome this limitation, based on the novel concept of mapping partitions. Mapping partitions are defined as groups of mapping rules that generate disjoint subsets of the knowledge graph. Each of these groups can be processed separately, reducing the total amount of memory and execution time required by the materialization process. We have included this optimization in our materialization engine Morph-KGC, and we have evaluated it over three different benchmarks. Our experimental results show that, compared with state-of-the-art techniques, the use of mapping partitions in Morph-KGC presents the following advantages: (i) it decreases significantly the time required for materialization, (ii) it reduces the maximum peak of memory used, and (iii) it scales to data sizes that other engines are not capable of processing currently.
{"title":"Morph-KGC: Scalable knowledge graph materialization with mapping partitions","authors":"Julián Arenas-Guerrero, David Chaves-Fraga, Jhon Toledo, María S. Pérez, Óscar Corcho","doi":"10.3233/sw-223135","DOIUrl":"https://doi.org/10.3233/sw-223135","url":null,"abstract":"Knowledge graphs are often constructed from heterogeneous data sources, using declarative rules that map them to a target ontology and materializing them into RDF. When these data sources are large, the materialization of the entire knowledge graph may be computationally expensive and not suitable for those cases where a rapid materialization is required. In this work, we propose an approach to overcome this limitation, based on the novel concept of mapping partitions. Mapping partitions are defined as groups of mapping rules that generate disjoint subsets of the knowledge graph. Each of these groups can be processed separately, reducing the total amount of memory and execution time required by the materialization process. We have included this optimization in our materialization engine Morph-KGC, and we have evaluated it over three different benchmarks. Our experimental results show that, compared with state-of-the-art techniques, the use of mapping partitions in Morph-KGC presents the following advantages: (i) it decreases significantly the time required for materialization, (ii) it reduces the maximum peak of memory used, and (iii) it scales to data sizes that other engines are not capable of processing currently.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"20 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84427491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Lombardo, T. Karatas, M. Gulmini, L. Guidorzi, D. Angelici
In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called “archaeometers”). A number of diverse scientific disciplines collaborate to get an objective account of the archaeological records. A large amount of digital data support the whole process, and there is a great value in keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years, a number of representation models have been developed to account for the recording of the archaeological process in data bases. Lately, some semantic models, compliant with the CRMarchaeo reference model, have been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to the archaeological and archaeometric analyses and interpretations, also connected to the recording catalogues. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel classes and properties to merge the two worlds in a joint representation. The ontology is in use in “Beyond Archaeology”, a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometry, interlinked through a semantic model of processes and objects.
{"title":"Transdisciplinary approach to archaeological investigations in a Semantic Web perspective","authors":"V. Lombardo, T. Karatas, M. Gulmini, L. Guidorzi, D. Angelici","doi":"10.3233/sw-223016","DOIUrl":"https://doi.org/10.3233/sw-223016","url":null,"abstract":"In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called “archaeometers”). A number of diverse scientific disciplines collaborate to get an objective account of the archaeological records. A large amount of digital data support the whole process, and there is a great value in keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years, a number of representation models have been developed to account for the recording of the archaeological process in data bases. Lately, some semantic models, compliant with the CRMarchaeo reference model, have been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to the archaeological and archaeometric analyses and interpretations, also connected to the recording catalogues. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel classes and properties to merge the two worlds in a joint representation. The ontology is in use in “Beyond Archaeology”, a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometry, interlinked through a semantic model of processes and objects.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"27 1","pages":"361-383"},"PeriodicalIF":3.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85617298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efthymia Moraitou, Yannis Christodoulou, G. Caridakis
Over the last decade, the Cultural Heritage (CH) domain has gradually adopted Semantic Web (SW) technologies for organizing information and for tackling interoperability issues. Several semantic models have been proposed which accommodate essential aspects of information management: retrieval, integration, reuse and sharing. In this context, the CH subdomain of Conservation and Restoration (CnR) exhibits an increasing interest in SW technologies, in an attempt to effectively handle the highly heterogeneous and often secluded CnR information. This paper investigates semantic models relevant to the CnR knowledge domain. The scope, development methodology and coverage of CnR aspects are described and discussed. Furthermore, the evaluation, deployment and current exploitation of each model are examined, with focus on the types and variety of services provided to support the CnR professional. Through this study, the following research questions are investigated: To what extent the various aspects of CnR are covered by existing CnR models? To what extent existing CnR models incorporate models of the broader CH domain and of relevant disciplines (e.g., Chemistry)? In what ways and to what extent services built upon the reviewed models facilitate CnR professionals in their various tasks? Finally, based on the findings, fields of interest that merit further investigation are suggested.
{"title":"Semantic models and services for conservation and restoration of cultural heritage: A comprehensive survey","authors":"Efthymia Moraitou, Yannis Christodoulou, G. Caridakis","doi":"10.3233/sw-223105","DOIUrl":"https://doi.org/10.3233/sw-223105","url":null,"abstract":"Over the last decade, the Cultural Heritage (CH) domain has gradually adopted Semantic Web (SW) technologies for organizing information and for tackling interoperability issues. Several semantic models have been proposed which accommodate essential aspects of information management: retrieval, integration, reuse and sharing. In this context, the CH subdomain of Conservation and Restoration (CnR) exhibits an increasing interest in SW technologies, in an attempt to effectively handle the highly heterogeneous and often secluded CnR information. This paper investigates semantic models relevant to the CnR knowledge domain. The scope, development methodology and coverage of CnR aspects are described and discussed. Furthermore, the evaluation, deployment and current exploitation of each model are examined, with focus on the types and variety of services provided to support the CnR professional. Through this study, the following research questions are investigated: To what extent the various aspects of CnR are covered by existing CnR models? To what extent existing CnR models incorporate models of the broader CH domain and of relevant disciplines (e.g., Chemistry)? In what ways and to what extent services built upon the reviewed models facilitate CnR professionals in their various tasks? Finally, based on the findings, fields of interest that merit further investigation are suggested.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"108 1","pages":"261-291"},"PeriodicalIF":3.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74660581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel
As machine learning techniques are being increasingly employed for text processing tasks, the need for training data has become a major bottleneck for their application. Manual generation of large scale training datasets tailored to each task is a time consuming and expensive process, which necessitates their automated generation. In this work, we turn our attention towards creation of training datasets for named entity recognition (NER) in the context of the cultural heritage domain. NER plays an important role in many natural language processing systems. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as digitized art archives, the recognition of fine-grained entity types such as titles of artworks is of high importance. Current state of the art tools are unable to adequately identify artwork titles due to unavailability of relevant training datasets. We analyse the particular difficulties presented by this domain and motivate the need for quality annotations to train machine learning models for identification of artwork titles. We present a framework with heuristic based approach to create high-quality training data by leveraging existing cultural heritage resources from knowledge bases such as Wikidata. Experimental evaluation shows significant improvement over the baseline for NER performance for artwork titles when models are trained on the dataset generated using our framework.
{"title":"Generation of training data for named entity recognition of artworks","authors":"Nitisha Jain, Alejandro Sierra-Múnera, Jan Ehmueller, Ralf Krestel","doi":"10.3233/sw-223177","DOIUrl":"https://doi.org/10.3233/sw-223177","url":null,"abstract":"As machine learning techniques are being increasingly employed for text processing tasks, the need for training data has become a major bottleneck for their application. Manual generation of large scale training datasets tailored to each task is a time consuming and expensive process, which necessitates their automated generation. In this work, we turn our attention towards creation of training datasets for named entity recognition (NER) in the context of the cultural heritage domain. NER plays an important role in many natural language processing systems. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as digitized art archives, the recognition of fine-grained entity types such as titles of artworks is of high importance. Current state of the art tools are unable to adequately identify artwork titles due to unavailability of relevant training datasets. We analyse the particular difficulties presented by this domain and motivate the need for quality annotations to train machine learning models for identification of artwork titles. We present a framework with heuristic based approach to create high-quality training data by leveraging existing cultural heritage resources from knowledge bases such as Wikidata. Experimental evaluation shows significant improvement over the baseline for NER performance for artwork titles when models are trained on the dataset generated using our framework.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"21 1","pages":"239-260"},"PeriodicalIF":3.0,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84507185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehwish Alam, Victor de Boer, E. Daga, Marieke Van Erp, E. Hyvönen, Albert Meroño-Peñuela
Guillermo Vega-Gorgojo, Eduardo Gómez-Sánchez, Juan I. Asensio-Pérez, Sergio Serrano-Iglesias, and Ale-jandra Martínez-Monés. The paper presents Casual Learn , an application that proposes ubiquitous learning tasks about Cultural Heritage. Casual Learn leverages a dataset of 10,000 contextualized learning tasks that were semi-automatically generated out of open data from the Web. Casual Learn offers these tasks to learners according to their physical location. For example, it may suggest describing the characteristics of the Gothic style when passing by a Gothic Cathedral. Additionally, Casual Learn has an interactive mode where learners can geo-search available tasks.
Guillermo Vega-Gorgojo, Eduardo Gómez-Sánchez, Juan I. asensio - psamurez, Sergio Serrano-Iglesias和Ale-jandra Martínez-Monés。本文介绍了Casual Learn,一个提出关于文化遗产的泛在学习任务的应用程序。Casual Learn利用了一个包含10,000个上下文化学习任务的数据集,这些任务是从Web上的开放数据中半自动生成的。Casual Learn根据学习者的实际位置为他们提供这些任务。例如,当经过一座哥特式大教堂时,它可能会建议描述哥特式风格的特征。此外,Casual Learn有一个互动模式,学习者可以在其中搜索可用的任务。
{"title":"Editorial of the Special issue on Cultural heritage and semantic web","authors":"Mehwish Alam, Victor de Boer, E. Daga, Marieke Van Erp, E. Hyvönen, Albert Meroño-Peñuela","doi":"10.3233/sw-223187","DOIUrl":"https://doi.org/10.3233/sw-223187","url":null,"abstract":"Guillermo Vega-Gorgojo, Eduardo Gómez-Sánchez, Juan I. Asensio-Pérez, Sergio Serrano-Iglesias, and Ale-jandra Martínez-Monés. The paper presents Casual Learn , an application that proposes ubiquitous learning tasks about Cultural Heritage. Casual Learn leverages a dataset of 10,000 contextualized learning tasks that were semi-automatically generated out of open data from the Web. Casual Learn offers these tasks to learners according to their physical location. For example, it may suggest describing the characteristics of the Gothic style when passing by a Gothic Cathedral. Additionally, Casual Learn has an interactive mode where learners can geo-search available tasks.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"66 1","pages":"155-158"},"PeriodicalIF":3.0,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84031443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consists of independently annotated six TED talks in seven different languages. It is shown that the linguistic labels over the relations annotated in the texts of these languages can be automatically linked with English with high accuracy, as verified against the relations of three diverse languages semi-automatically linked with relations over English texts. The resulting corpus has a great potential to reveal the divergences in local discourse relations, as well as leading to new resources, as exemplified by the induction of bilingual discourse connective lexicons.
{"title":"Linking discourse-level information and the induction of bilingual discourse connective lexicons","authors":"Sibel Özer, Murathan Kurfali, Deniz Zeyrek, Amália Mendes, Giedre Valunaite Oleskeviciene","doi":"10.3233/sw-223011","DOIUrl":"https://doi.org/10.3233/sw-223011","url":null,"abstract":"The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consists of independently annotated six TED talks in seven different languages. It is shown that the linguistic labels over the relations annotated in the texts of these languages can be automatically linked with English with high accuracy, as verified against the relations of three diverse languages semi-automatically linked with relations over English texts. The resulting corpus has a great potential to reveal the divergences in local discourse relations, as well as leading to new resources, as exemplified by the induction of bilingual discourse connective lexicons.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"103 1","pages":"1081-1102"},"PeriodicalIF":3.0,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85861348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anas Fahad Khan, C. Chiarcos, T. Declerck, Daniela Gîfu, Elena González-Blanco García, J. Gracia, Maxim Ionov, Penny Labropoulou, Francesco Mambrini, John P. McCrae, Émilie Pagé-Perron, M. Passarotti, Salvador Ros Muñoz, Ciprian-Octavian Truică
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
{"title":"When linguistics meets web technologies. Recent advances in modelling linguistic linked data","authors":"Anas Fahad Khan, C. Chiarcos, T. Declerck, Daniela Gîfu, Elena González-Blanco García, J. Gracia, Maxim Ionov, Penny Labropoulou, Francesco Mambrini, John P. McCrae, Émilie Pagé-Perron, M. Passarotti, Salvador Ros Muñoz, Ciprian-Octavian Truică","doi":"10.3233/sw-222859","DOIUrl":"https://doi.org/10.3233/sw-222859","url":null,"abstract":"This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"1 1","pages":"987-1050"},"PeriodicalIF":3.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85682881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In cultural heritage, many projects execute Named Entity Linking (NEL) through global Linked Open Data (LOD) references in order to identify and disambiguate entities in their local datasets. It allows users to obtain extra information and contextualise the data with it. Thus, the aggregation and integration of heterogeneous LOD are expected. However, such development is still limited partly due to data quality issues. In addition, analysis on the LOD quality has not sufficiently been conducted for cultural heritage. Moreover, most research on data quality concentrates on ontology and corpus level observations. This paper examines the quality of the eleven major LOD sources used for NEL in cultural heritage with an emphasis on instance-level connectivity and graph traversals. Standardised linking properties are inspected for 100 instances/entities in order to create traversal route maps. Other properties are also assessed for quantity and quality. The outcomes suggest that the LOD is not fully interconnected and centrally condensed; the quantity and quality are unbalanced. Therefore, they cast doubt on the possibility of automatically identifying, accessing, and integrating known and unknown datasets. This implies the need for LOD improvement, as well as the NEL strategies to maximise the data integration.
{"title":"Instance level analysis on linked open data connectivity for cultural heritage entity linking and data integration","authors":"Go Sugimoto","doi":"10.3233/sw-223026","DOIUrl":"https://doi.org/10.3233/sw-223026","url":null,"abstract":"In cultural heritage, many projects execute Named Entity Linking (NEL) through global Linked Open Data (LOD) references in order to identify and disambiguate entities in their local datasets. It allows users to obtain extra information and contextualise the data with it. Thus, the aggregation and integration of heterogeneous LOD are expected. However, such development is still limited partly due to data quality issues. In addition, analysis on the LOD quality has not sufficiently been conducted for cultural heritage. Moreover, most research on data quality concentrates on ontology and corpus level observations. This paper examines the quality of the eleven major LOD sources used for NEL in cultural heritage with an emphasis on instance-level connectivity and graph traversals. Standardised linking properties are inspected for 100 instances/entities in order to create traversal route maps. Other properties are also assessed for quantity and quality. The outcomes suggest that the LOD is not fully interconnected and centrally condensed; the quantity and quality are unbalanced. Therefore, they cast doubt on the possibility of automatically identifying, accessing, and integrating known and unknown datasets. This implies the need for LOD improvement, as well as the NEL strategies to maximise the data integration.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"10 1","pages":"55-100"},"PeriodicalIF":3.0,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81930212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities.
{"title":"Analysis of ontologies and policy languages to represent information flows in GDPR","authors":"Beatriz Esteves, V. Rodríguez-Doncel","doi":"10.3233/sw-223009","DOIUrl":"https://doi.org/10.3233/sw-223009","url":null,"abstract":"This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72513687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}