Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. Random walk based path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores a bidirectional Gated Recurrent Unit (GRU) based encoder-decoder to learn the representation of the paths whereas SBERT is used to generate the representation of the entity descriptions. The proposed approach outperforms most of the state-of-the-art models and achieves comparable results with the rest when evaluated with FB15K, FB15K-237, WN18, WN18RR, and YAGO3-10 datasets.
{"title":"MADLINK: Attentive multihop and entity descriptions for link prediction in knowledge graphs","authors":"Russa Biswas, Harald Sack, Mehwish Alam","doi":"10.3233/sw-222960","DOIUrl":"https://doi.org/10.3233/sw-222960","url":null,"abstract":"Knowledge Graphs (KGs) comprise of interlinked information in the form of entities and relations between them in a particular domain and provide the backbone for many applications. However, the KGs are often incomplete as the links between the entities are missing. Link Prediction is the task of predicting these missing links in a KG based on the existing links. Recent years have witnessed many studies on link prediction using KG embeddings which is one of the mainstream tasks in KG completion. To do so, most of the existing methods learn the latent representation of the entities and relations whereas only a few of them consider contextual information as well as the textual descriptions of the entities. This paper introduces an attentive encoder-decoder based link prediction approach considering both structural information of the KG and the textual entity descriptions. Random walk based path selection method is used to encapsulate the contextual information of an entity in a KG. The model explores a bidirectional Gated Recurrent Unit (GRU) based encoder-decoder to learn the representation of the paths whereas SBERT is used to generate the representation of the entity descriptions. The proposed approach outperforms most of the state-of-the-art models and achieves comparable results with the rest when evaluated with FB15K, FB15K-237, WN18, WN18RR, and YAGO3-10 datasets.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"117 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75416724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Chaves-Fraga, Pieter Colpaert, Mersedeh Sadeghi, M. Comerio
Whether you are planning your next trip abroad or want a package delivered to your doorstep, chances are high that you will need a chain of services provided by multiple companies. Transport is inherently a geographically and administratively decentralized domain composed of a diverse set of actors, – from public transport authorities to vehicle sharing companies, infrastructure managers in different sectors (road, rail, etc.), transport operators, retailers, and distributors. As a result, it suffers vast data heterogeneity, which, in turn, brings severe challenges to data interoperability. However, such challenges have also been posed in other domains such as the Internet of Things [18], agriculture [11], building data management [17], biology [7] or open data [2], which have found their solutions using semantic web technologies. However, despite several research contributions [6,14,19,23,25], public-funded projects1,2 or academic-industry events,3,4 we have not yet seen a wide adoption of semantic technologies in the transport domain. We may only guess the inhibitors for adopting Linked Data in this domain: i) the SPARQL query language is not built for optimal path planning, and ii) RDF is perceived as highly conceptual by industry experts. We argue that SPARQL does not fit well with the concerns that typically matter to route planners (e.g., calculating the optimal Pareto path [4]). While calculating a path with SPARQL is feasible through property paths, controlling the path planning algorithm, which can hardly be done in SPARQL, is the core concern of route planners. On the other hand, the transport domain is dominated by different standards (e.g., NeTEx,5 or DATEX II6) and vocabularies, which are based on legacy data exchange technologies (e.g., XML or RDB). However, to construct a distributed and scalable architecture that addresses the current needs of this domain, the Web and its associated technologies (i.e., the Semantic Web) are the key resource.
{"title":"Editorial of transport data on the web","authors":"David Chaves-Fraga, Pieter Colpaert, Mersedeh Sadeghi, M. Comerio","doi":"10.3233/sw-223278","DOIUrl":"https://doi.org/10.3233/sw-223278","url":null,"abstract":"Whether you are planning your next trip abroad or want a package delivered to your doorstep, chances are high that you will need a chain of services provided by multiple companies. Transport is inherently a geographically and administratively decentralized domain composed of a diverse set of actors, – from public transport authorities to vehicle sharing companies, infrastructure managers in different sectors (road, rail, etc.), transport operators, retailers, and distributors. As a result, it suffers vast data heterogeneity, which, in turn, brings severe challenges to data interoperability. However, such challenges have also been posed in other domains such as the Internet of Things [18], agriculture [11], building data management [17], biology [7] or open data [2], which have found their solutions using semantic web technologies. However, despite several research contributions [6,14,19,23,25], public-funded projects1,2 or academic-industry events,3,4 we have not yet seen a wide adoption of semantic technologies in the transport domain. We may only guess the inhibitors for adopting Linked Data in this domain: i) the SPARQL query language is not built for optimal path planning, and ii) RDF is perceived as highly conceptual by industry experts. We argue that SPARQL does not fit well with the concerns that typically matter to route planners (e.g., calculating the optimal Pareto path [4]). While calculating a path with SPARQL is feasible through property paths, controlling the path planning algorithm, which can hardly be done in SPARQL, is the core concern of route planners. On the other hand, the transport domain is dominated by different standards (e.g., NeTEx,5 or DATEX II6) and vocabularies, which are based on legacy data exchange technologies (e.g., XML or RDB). However, to construct a distributed and scalable architecture that addresses the current needs of this domain, the Web and its associated technologies (i.e., the Semantic Web) are the key resource.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"7 1","pages":"613-616"},"PeriodicalIF":3.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84559368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine Thornton, Kenneth Seals-Nutt, Marianne Van Remoortel, Julie M. Birkholz, P. D. Potter
Stories are important tools for recounting and sharing the past. To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the example of the WeChangEd research project on women editors of periodicals in Europe from 1710–1920 we detail how to move from archive, to a structured data model and relational database, to Wikidata, to the use of the Stories Services API to generate multimedia stories related to people, organizations and periodicals. As more humanists, social scientists and other researchers choose to contribute their data to Wikidata we will all benefit. As researchers add data, the breadth and complexity of the questions we can ask about the data we have contributed will increase. Building applications that syndicate data from Wikidata allows us to leverage a general purpose knowledge graph with a growing number of references back to scholarly literature. Using frameworks developed by the Wikidata community allows us to rapidly provision interactive sites that will help us engage new audiences. This process that we detail here may be of interest to other researchers and cultural heritage institutions seeking web-based presentation options for telling stories from their data.
{"title":"Linking women editors of periodicals to the Wikidata knowledge graph","authors":"Katherine Thornton, Kenneth Seals-Nutt, Marianne Van Remoortel, Julie M. Birkholz, P. D. Potter","doi":"10.3233/sw-222845","DOIUrl":"https://doi.org/10.3233/sw-222845","url":null,"abstract":"Stories are important tools for recounting and sharing the past. To tell a story one has to put together diverse information about people, places, time periods, and things. We detail here how a machine, through the power of Semantic Web, can compile scattered and diverse materials and information to construct stories. Through the example of the WeChangEd research project on women editors of periodicals in Europe from 1710–1920 we detail how to move from archive, to a structured data model and relational database, to Wikidata, to the use of the Stories Services API to generate multimedia stories related to people, organizations and periodicals. As more humanists, social scientists and other researchers choose to contribute their data to Wikidata we will all benefit. As researchers add data, the breadth and complexity of the questions we can ask about the data we have contributed will increase. Building applications that syndicate data from Wikidata allows us to leverage a general purpose knowledge graph with a growing number of references back to scholarly literature. Using frameworks developed by the Wikidata community allows us to rapidly provision interactive sites that will help us engage new audiences. This process that we detail here may be of interest to other researchers and cultural heritage institutions seeking web-based presentation options for telling stories from their data.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"20 1","pages":"443-455"},"PeriodicalIF":3.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90501657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pouya Ghiasnezhad Omran, K. Taylor, Sergio J. Rodríguez Méndez, A. Haller
Knowledge Graphs (KGs) have proliferated on the Web since the introduction of knowledge panels to Google search in 2012. KGs are large data-first graph databases with weak inference rules and weakly-constraining data schemes. SHACL, the Shapes Constraint Language, is a W3C recommendation for expressing constraints on graph data as shapes. SHACL shapes serve to validate a KG, to underpin manual KG editing tasks, and to offer insight into KG structure. Often in practice, large KGs have no available shape constraints and so cannot obtain these benefits for ongoing maintenance and extension. We introduce Inverse Open Path (IOP) rules, a predicate logic formalism which presents specific shapes in the form of paths over connected entities that are present in a KG. IOP rules express simple shape patterns that can be augmented with minimum cardinality constraints and also used as a building block for more complex shapes, such as trees and other rule patterns. We define formal quality measures for IOP rules and propose a novel method to learn high-quality rules from KGs. We show how to build high-quality tree shapes from the IOP rules. Our learning method, SHACLearner, is adapted from a state-of-the-art embedding-based open path rule learner (Oprl). We evaluate SHACLearner on some real-world massive KGs, including YAGO2s (4M facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts). The experiments show that our SHACLearner can effectively learn informative and intuitive shapes from massive KGs. The shapes are diverse in structural features such as depth and width, and also in quality measures that indicate confidence and generality.
{"title":"Learning SHACL shapes from knowledge graphs","authors":"Pouya Ghiasnezhad Omran, K. Taylor, Sergio J. Rodríguez Méndez, A. Haller","doi":"10.3233/sw-223063","DOIUrl":"https://doi.org/10.3233/sw-223063","url":null,"abstract":"Knowledge Graphs (KGs) have proliferated on the Web since the introduction of knowledge panels to Google search in 2012. KGs are large data-first graph databases with weak inference rules and weakly-constraining data schemes. SHACL, the Shapes Constraint Language, is a W3C recommendation for expressing constraints on graph data as shapes. SHACL shapes serve to validate a KG, to underpin manual KG editing tasks, and to offer insight into KG structure. Often in practice, large KGs have no available shape constraints and so cannot obtain these benefits for ongoing maintenance and extension. We introduce Inverse Open Path (IOP) rules, a predicate logic formalism which presents specific shapes in the form of paths over connected entities that are present in a KG. IOP rules express simple shape patterns that can be augmented with minimum cardinality constraints and also used as a building block for more complex shapes, such as trees and other rule patterns. We define formal quality measures for IOP rules and propose a novel method to learn high-quality rules from KGs. We show how to build high-quality tree shapes from the IOP rules. Our learning method, SHACLearner, is adapted from a state-of-the-art embedding-based open path rule learner (Oprl). We evaluate SHACLearner on some real-world massive KGs, including YAGO2s (4M facts), DBpedia 3.8 (11M facts), and Wikidata (8M facts). The experiments show that our SHACLearner can effectively learn informative and intuitive shapes from massive KGs. The shapes are diverse in structural features such as depth and width, and also in quality measures that indicate confidence and generality.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"5 1","pages":"101-121"},"PeriodicalIF":3.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83798341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. E. Labra Gayo, Anastasia Dimou, Katherine Thornton, A. Rula
{"title":"Editorial of knowledge graphs validation and quality","authors":"J. E. Labra Gayo, Anastasia Dimou, Katherine Thornton, A. Rula","doi":"10.3233/sw-223261","DOIUrl":"https://doi.org/10.3233/sw-223261","url":null,"abstract":"","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"41 1","pages":"3-4"},"PeriodicalIF":3.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84093431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the inception of the Open Linguistics Working Group in 2010, there have been numerous efforts in transforming language resources into Linked Data. The research field of Linguistic Linked Data (LLD) has gained in importance, visibility and impact, with the Linguistic Linked Open Data (LLOD) cloud gathering nowadays over 200 resources. With this increasing growth, new challenges have emerged concerning particular domain and task applications, quality dimensions, and linguistic features to take into account. This special issue aims to review and summarize the progress and status of LLD research in recent years, as well as to offer an understanding of the challenges ahead of the field for the years to come. The papers in this issue indicate that there are still aspects to address for a wider community adoption of LLD, as well as a lack of resources for specific tasks and (interdisciplinary) domains. Likewise, the integration of LLD resources into Natural Language Processing (NLP) architectures and the search for long-term infrastructure solutions to host LLD resources continue to be essential points to which to attend in the foreseeable future of the research line.
{"title":"Editorial of the Special Issue on Latest Advancements in Linguistic Linked Data","authors":"Julia Bosque-Gil, P. Cimiano, Milan Dojchinovski","doi":"10.3233/sw-223251","DOIUrl":"https://doi.org/10.3233/sw-223251","url":null,"abstract":"Since the inception of the Open Linguistics Working Group in 2010, there have been numerous efforts in transforming language resources into Linked Data. The research field of Linguistic Linked Data (LLD) has gained in importance, visibility and impact, with the Linguistic Linked Open Data (LLOD) cloud gathering nowadays over 200 resources. With this increasing growth, new challenges have emerged concerning particular domain and task applications, quality dimensions, and linguistic features to take into account. This special issue aims to review and summarize the progress and status of LLD research in recent years, as well as to offer an understanding of the challenges ahead of the field for the years to come. The papers in this issue indicate that there are still aspects to address for a wider community adoption of LLD, as well as a lack of resources for specific tasks and (interdisciplinary) domains. Likewise, the integration of LLD resources into Natural Language Processing (NLP) architectures and the search for long-term infrastructure solutions to host LLD resources continue to be essential points to which to attend in the foreseeable future of the research line.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"96 1","pages":"911-916"},"PeriodicalIF":3.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80912659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Basel Shbita, Craig A. Knoblock, Weiwei Duan, Yao-Yi Chiang, Johannes H. Uhl, S. Leyk
Historical maps provide rich information for researchers in many areas, including the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as changes in transportation networks or the decline of wetlands or forest areas. Analyzing changes over time in such maps can be labor-intensive for a scientist, even after the geographic features have been digitized and converted to a vector format. Knowledge Graphs (KGs) are the appropriate representations to store and link such data and support semantic and temporal querying to facilitate change analysis. KGs combine expressivity, interoperability, and standardization in the Semantic Web stack, thus providing a strong foundation for querying and analysis. In this paper, we present an automatic approach to convert vector geographic features extracted from multiple historical maps into contextualized spatio-temporal KGs. The resulting graphs can be easily queried and visualized to understand the changes in different regions over time. We evaluate our technique on railroad networks and wetland areas extracted from the United States Geological Survey (USGS) historical topographic maps for several regions over multiple map sheets and editions. We also demonstrate how the automatically constructed linked data (i.e., KGs) enable effective querying and visualization of changes over different points in time.
{"title":"Building spatio-temporal knowledge graphs from vectorized topographic historical maps","authors":"Basel Shbita, Craig A. Knoblock, Weiwei Duan, Yao-Yi Chiang, Johannes H. Uhl, S. Leyk","doi":"10.3233/sw-222918","DOIUrl":"https://doi.org/10.3233/sw-222918","url":null,"abstract":"Historical maps provide rich information for researchers in many areas, including the social and natural sciences. These maps contain detailed documentation of a wide variety of natural and human-made features and their changes over time, such as changes in transportation networks or the decline of wetlands or forest areas. Analyzing changes over time in such maps can be labor-intensive for a scientist, even after the geographic features have been digitized and converted to a vector format. Knowledge Graphs (KGs) are the appropriate representations to store and link such data and support semantic and temporal querying to facilitate change analysis. KGs combine expressivity, interoperability, and standardization in the Semantic Web stack, thus providing a strong foundation for querying and analysis. In this paper, we present an automatic approach to convert vector geographic features extracted from multiple historical maps into contextualized spatio-temporal KGs. The resulting graphs can be easily queried and visualized to understand the changes in different regions over time. We evaluate our technique on railroad networks and wetland areas extracted from the United States Geological Survey (USGS) historical topographic maps for several regions over multiple map sheets and editions. We also demonstrate how the automatically constructed linked data (i.e., KGs) enable effective querying and visualization of changes over different points in time.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"3 1","pages":"527-549"},"PeriodicalIF":3.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90985201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ontology matching is an integral part for establishing semantic interoperability. One of the main challenges within the ontology matching operation is semantic heterogeneity, i.e. modeling differences between the two ontologies that are to be integrated. The semantics within most ontologies or schemas are, however, typically incomplete because they are designed within a certain context which is not explicitly modeled. Therefore, external background knowledge plays a major role in the task of (semi-) automated ontology and schema matching. In this survey, we introduce the reader to the general ontology matching problem. We review the background knowledge sources as well as the approaches applied to make use of external knowledge. Our survey covers all ontology matching systems that have been presented within the years 2004–2021 at a well-known ontology matching competition together with systematically selected publications in the research field. We present a classification system for external background knowledge, concept linking strategies, as well as for background knowledge exploitation approaches. We provide extensive examples and classify all ontology matching systems under review in a resource/strategy matrix obtained by coalescing the two classification systems. Lastly, we outline interesting and yet underexplored research directions of applying external knowledge within the ontology matching process.
{"title":"Background knowledge in ontology matching: A survey","authors":"Jan Portisch, M. Hladik, Heiko Paulheim","doi":"10.3233/sw-223085","DOIUrl":"https://doi.org/10.3233/sw-223085","url":null,"abstract":"Ontology matching is an integral part for establishing semantic interoperability. One of the main challenges within the ontology matching operation is semantic heterogeneity, i.e. modeling differences between the two ontologies that are to be integrated. The semantics within most ontologies or schemas are, however, typically incomplete because they are designed within a certain context which is not explicitly modeled. Therefore, external background knowledge plays a major role in the task of (semi-) automated ontology and schema matching. In this survey, we introduce the reader to the general ontology matching problem. We review the background knowledge sources as well as the approaches applied to make use of external knowledge. Our survey covers all ontology matching systems that have been presented within the years 2004–2021 at a well-known ontology matching competition together with systematically selected publications in the research field. We present a classification system for external background knowledge, concept linking strategies, as well as for background knowledge exploitation approaches. We provide extensive examples and classify all ontology matching systems under review in a resource/strategy matrix obtained by coalescing the two classification systems. Lastly, we outline interesting and yet underexplored research directions of applying external knowledge within the ontology matching process.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"37 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84360736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.
{"title":"Bilingual dictionary generation and enrichment via graph exploration","authors":"Shashwat Goel, Jorge Gracia, M. Forcada","doi":"10.3233/sw-222899","DOIUrl":"https://doi.org/10.3233/sw-222899","url":null,"abstract":"In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"14 1","pages":"1103-1132"},"PeriodicalIF":3.0,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76042247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
News consumption has shifted over time from traditional media to online platforms, which use recommendation algorithms to help users navigate through the large incoming streams of daily news by suggesting relevant articles based on their preferences and reading behavior. In comparison to domains such as movies or e-commerce, where recommender systems have proved highly successful, the characteristics of the news domain (e.g., high frequency of articles appearing and becoming outdated, greater dynamics of user interest, less explicit relations between articles, and lack of explicit user feedback) pose additional challenges for the recommendation models. While some of these can be overcome by conventional recommendation techniques, injecting external knowledge into news recommender systems has been proposed in order to enhance recommendations by capturing information and patterns not contained in the text and metadata of articles, and hence, tackle shortcomings of traditional models. This survey provides a comprehensive review of knowledge-aware news recommender systems. We propose a taxonomy that divides the models into three categories: neural methods, non-neural entity-centric methods, and non-neural path-based methods. Moreover, the underlying recommendation algorithms, as well as their evaluations are analyzed. Lastly, open issues in the domain of knowledge-aware news recommendations are identified and potential research directions are proposed.
{"title":"A survey on knowledge-aware news recommender systems","authors":"Andreea Iana, Mehwish Alam, Heiko Paulheim","doi":"10.3233/sw-222991","DOIUrl":"https://doi.org/10.3233/sw-222991","url":null,"abstract":"News consumption has shifted over time from traditional media to online platforms, which use recommendation algorithms to help users navigate through the large incoming streams of daily news by suggesting relevant articles based on their preferences and reading behavior. In comparison to domains such as movies or e-commerce, where recommender systems have proved highly successful, the characteristics of the news domain (e.g., high frequency of articles appearing and becoming outdated, greater dynamics of user interest, less explicit relations between articles, and lack of explicit user feedback) pose additional challenges for the recommendation models. While some of these can be overcome by conventional recommendation techniques, injecting external knowledge into news recommender systems has been proposed in order to enhance recommendations by capturing information and patterns not contained in the text and metadata of articles, and hence, tackle shortcomings of traditional models. This survey provides a comprehensive review of knowledge-aware news recommender systems. We propose a taxonomy that divides the models into three categories: neural methods, non-neural entity-centric methods, and non-neural path-based methods. Moreover, the underlying recommendation algorithms, as well as their evaluations are analyzed. Lastly, open issues in the domain of knowledge-aware news recommendations are identified and potential research directions are proposed.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"24 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77169110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}