首页 > 最新文献

Proceedings of the International Workshop on Semantic Big Data最新文献

英文 中文
What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning 你的知识图谱的图式是什么?:利用知识图嵌入和聚类进行表达性分类学习
Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393637
A. Zouaq, Félix Martel
Large-scale knowledge graphs have become prevalent on the Web and have demonstrated their usefulness for several tasks. One challenge associated to knowledge graphs is the necessity to keep a knowledge graph schema (which is generally manually defined) that accurately reflects the knowledge graph content. In this paper, we present an approach that extracts an expressive taxonomy based on knowledge graph embeddings, linked data statistics and clustering. Our results show that the learned taxonomy is not only able to retain original classes but also identifies new classes, thus giving an up-to-date view of the knowledge graph.
大规模的知识图已经在Web上流行起来,并且已经证明了它们对一些任务的有用性。与知识图相关的一个挑战是保持准确反映知识图内容的知识图模式(通常是手动定义的)的必要性。本文提出了一种基于知识图嵌入、关联数据统计和聚类的表达性分类方法。我们的研究结果表明,学习到的分类不仅能够保留原有的类,而且能够识别新的类,从而提供了一个最新的知识图谱视图。
{"title":"What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning","authors":"A. Zouaq, Félix Martel","doi":"10.1145/3391274.3393637","DOIUrl":"https://doi.org/10.1145/3391274.3393637","url":null,"abstract":"Large-scale knowledge graphs have become prevalent on the Web and have demonstrated their usefulness for several tasks. One challenge associated to knowledge graphs is the necessity to keep a knowledge graph schema (which is generally manually defined) that accurately reflects the knowledge graph content. In this paper, we present an approach that extracts an expressive taxonomy based on knowledge graph embeddings, linked data statistics and clustering. Our results show that the learned taxonomy is not only able to retain original classes but also identifies new classes, thus giving an up-to-date view of the knowledge graph.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123660850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Relaxing global-as-view in mediated data integration from linked data 在关联数据的中介数据集成中放松全局即视图
Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393635
A. Adamou, M. d’Aquin
In scenarios where many different, independent and dynamic data sources need to be brought together, mediated data integration at runtime is rapidly gaining interest. In a global-as-view approach, schema mappings express how to get data from each data source according to the global schema of the mediator. Key issues include the effort required to include and map new data sources, and the very need of data sources for the global schema to be expressed. It has been argued that the principles of Linked Data can be used to spread the cost of adding new sources in a pay-as-you-go model. We contribute by describing a data integration framework able to mitigate these issues, by relating data sources under a global schema which is implicit and only partly known at the time a new data source joins. Mappings over a data source only require partial knowledge of it and of the part of the global schema that it will affect. Pay-as-you go can then be employed to guarantee eventual schema compliance. This approach was adopted in a large-scale data integration system for Smart Cities, where it allowed short time-to-publish for new data and iterative schema refinements.
在需要将许多不同的、独立的和动态的数据源聚集在一起的场景中,运行时的中介数据集成正迅速引起人们的兴趣。在全局即视图方法中,模式映射表示如何根据中介的全局模式从每个数据源获取数据。关键问题包括包含和映射新数据源所需的工作,以及表示全局模式所需的数据源。有人认为,关联数据的原则可以用来分摊在即用即付模式中增加新资源的成本。我们通过描述一个能够缓解这些问题的数据集成框架来做出贡献,通过在全局模式下关联数据源,该模式是隐式的,并且在新数据源连接时仅部分已知。数据源上的映射只需要对数据源及其影响的全局模式的部分知识。然后可以采用按需付费的方式来保证最终的模式遵从性。这种方法被用于智能城市的大规模数据集成系统,它允许在短时间内发布新数据和迭代模式改进。
{"title":"Relaxing global-as-view in mediated data integration from linked data","authors":"A. Adamou, M. d’Aquin","doi":"10.1145/3391274.3393635","DOIUrl":"https://doi.org/10.1145/3391274.3393635","url":null,"abstract":"In scenarios where many different, independent and dynamic data sources need to be brought together, mediated data integration at runtime is rapidly gaining interest. In a global-as-view approach, schema mappings express how to get data from each data source according to the global schema of the mediator. Key issues include the effort required to include and map new data sources, and the very need of data sources for the global schema to be expressed. It has been argued that the principles of Linked Data can be used to spread the cost of adding new sources in a pay-as-you-go model. We contribute by describing a data integration framework able to mitigate these issues, by relating data sources under a global schema which is implicit and only partly known at the time a new data source joins. Mappings over a data source only require partial knowledge of it and of the part of the global schema that it will affect. Pay-as-you go can then be employed to guarantee eventual schema compliance. This approach was adopted in a large-scale data integration system for Smart Cities, where it allowed short time-to-publish for new data and iterative schema refinements.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134313783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets 了解Spark-SQL处理大量分布式RDF数据集的性能
Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393632
Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr
Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data systems (e.g, Hadoop, Spark) can handle vast relational repositories, however, their application in the Semantic Web context is still limited. One possible reason is that such frameworks rely on distributed systems, which are good for relational data, however, their performance on dealing with graph data models like RDF has not been well-studied yet. In this paper, we present a systematic evaluation of the performance of SparkSQL engine for processing SPARQL queries. We stated it using three relevant RDF relational schemas, and two different storage backends, namely, Hive, and HDFS. In addition, we show the impact of using three different RDF-based partitioning techniques with our relational scenario. Additionally, we discuss the results of our experiments: (i) we present insights about the trade-offs that characterize different experimental configurations, and (ii) we identify the best and the worst ones for the SP2Bench's benchmark scenario.
最近,大量的Web应用程序(例如DBPedia、Uniprot和Probase)都是建立在庞大的RDF知识库之上,并使用SPARQL查询语言。这些知识库的不断增长导致了对存储、访问和查询RDF数据的新范式和新技术的研究。在实践中,现代大数据系统(如Hadoop、Spark)可以处理大量的关系存储库,然而,它们在语义Web上下文中的应用仍然有限。一个可能的原因是,这些框架依赖于分布式系统,这对关系数据很好,但是,它们在处理像RDF这样的图数据模型方面的性能还没有得到很好的研究。在本文中,我们系统地评估了SparkSQL引擎处理SPARQL查询的性能。我们使用了三个相关的RDF关系模式,以及两个不同的存储后端,即Hive和HDFS。此外,我们还展示了在我们的关系场景中使用三种不同的基于rdf的分区技术的影响。此外,我们讨论了我们的实验结果:(i)我们提出了关于不同实验配置特征的权衡的见解,以及(ii)我们确定了SP2Bench基准测试场景的最佳和最差的。
{"title":"Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets","authors":"Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr","doi":"10.1145/3391274.3393632","DOIUrl":"https://doi.org/10.1145/3391274.3393632","url":null,"abstract":"Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data systems (e.g, Hadoop, Spark) can handle vast relational repositories, however, their application in the Semantic Web context is still limited. One possible reason is that such frameworks rely on distributed systems, which are good for relational data, however, their performance on dealing with graph data models like RDF has not been well-studied yet. In this paper, we present a systematic evaluation of the performance of SparkSQL engine for processing SPARQL queries. We stated it using three relevant RDF relational schemas, and two different storage backends, namely, Hive, and HDFS. In addition, we show the impact of using three different RDF-based partitioning techniques with our relational scenario. Additionally, we discuss the results of our experiments: (i) we present insights about the trade-offs that characterize different experimental configurations, and (ii) we identify the best and the worst ones for the SP2Bench's benchmark scenario.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Ten ways of leveraging ontologies for natural language processing and its enterprise applications 利用本体进行自然语言处理及其企业应用的十种方法
Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393639
T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan
In the last years, Artificial Intelligence and Deep Learning have matured from a facinating research area to real-word applications across multiple domains. Enterprises adopt data-driven approaches for various use cases. With the increased adoption, such issues as governance of the models, deployment, scalability, reusablity and maintenance are widely addressed on the engineering side, but not so much on the knowledge side. In this paper, we demonstrate 10 ways of leveraging ontology for Natural Language Processing. Specifically, we explore the usage of ontologies and related standards for labeling schema, configuration, providing lexical data, powering rule engine and automated generation of rules, as well as providing a standard output format. Additionally, we discuss three NLP-based applications: semantic search, question answering and natural language querying and show how they can benefit from ontology usage. The paper summarizes our experience of using ontology in a number of projects for medical, enterprise, financial, legal and security domains.
在过去的几年里,人工智能和深度学习已经从一个迷人的研究领域成熟到跨多个领域的实际应用。企业为各种用例采用数据驱动的方法。随着采用的增加,诸如模型的治理、部署、可伸缩性、可重用性和维护等问题在工程方面得到了广泛的解决,但在知识方面却没有得到太多的解决。在本文中,我们展示了利用本体进行自然语言处理的10种方法。具体来说,我们将探索本体和相关标准的使用,用于标记模式、配置、提供词法数据、支持规则引擎和自动生成规则,以及提供标准输出格式。此外,我们讨论了三种基于nlp的应用:语义搜索、问答和自然语言查询,并展示了它们如何从本体使用中受益。本文总结了我们在医疗、企业、金融、法律和安全领域的一些项目中使用本体的经验。
{"title":"Ten ways of leveraging ontologies for natural language processing and its enterprise applications","authors":"T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan","doi":"10.1145/3391274.3393639","DOIUrl":"https://doi.org/10.1145/3391274.3393639","url":null,"abstract":"In the last years, Artificial Intelligence and Deep Learning have matured from a facinating research area to real-word applications across multiple domains. Enterprises adopt data-driven approaches for various use cases. With the increased adoption, such issues as governance of the models, deployment, scalability, reusablity and maintenance are widely addressed on the engineering side, but not so much on the knowledge side. In this paper, we demonstrate 10 ways of leveraging ontology for Natural Language Processing. Specifically, we explore the usage of ontologies and related standards for labeling schema, configuration, providing lexical data, powering rule engine and automated generation of rules, as well as providing a standard output format. Additionally, we discuss three NLP-based applications: semantic search, question answering and natural language querying and show how they can benefit from ontology usage. The paper summarizes our experience of using ontology in a number of projects for medical, enterprise, financial, legal and security domains.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134251831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Triag, a framework based on triangles of RDF triples Triag,一个基于RDF三元组三角形的框架
Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393634
Hubert Naacke, Olivier Curé
The success of RDF-based enterprise Knowledge Graphs partly depends on the efficiency to serve SPARQL queries over large datasets. This usually requires the optimization of a large number of joins between a query's triple patterns. A common solution to this problem is to index triples in several orders and to provide adapted query processing optimizations. In this paper, we extend this approach by proposing a framework that tackles a frequently encountered basic graph pattern: triangles. We present appropriate data structures to store these triangles, provide distributed algorithms to discover and materialize them (including inferred triangles), and detail query optimization techniques. Experimental results conducted over an Apache Spark implementation on two real-world RDF datasets emphasize the performance boost obtained with our approach.
基于rdf的企业知识图的成功部分取决于对大型数据集提供SPARQL查询的效率。这通常需要优化查询的三重模式之间的大量连接。这个问题的一个常见解决方案是按几个顺序对三元组进行索引,并提供相应的查询处理优化。在本文中,我们通过提出一个框架来扩展这种方法,该框架可以处理经常遇到的基本图形模式:三角形。我们提出了适当的数据结构来存储这些三角形,提供分布式算法来发现和实现它们(包括推断三角形),并详细介绍了查询优化技术。在两个真实的RDF数据集上通过Apache Spark实现进行的实验结果强调了使用我们的方法获得的性能提升。
{"title":"Triag, a framework based on triangles of RDF triples","authors":"Hubert Naacke, Olivier Curé","doi":"10.1145/3391274.3393634","DOIUrl":"https://doi.org/10.1145/3391274.3393634","url":null,"abstract":"The success of RDF-based enterprise Knowledge Graphs partly depends on the efficiency to serve SPARQL queries over large datasets. This usually requires the optimization of a large number of joins between a query's triple patterns. A common solution to this problem is to index triples in several orders and to provide adapted query processing optimizations. In this paper, we extend this approach by proposing a framework that tackles a frequently encountered basic graph pattern: triangles. We present appropriate data structures to store these triangles, provide distributed algorithms to discover and materialize them (including inferred triangles), and detail query optimization techniques. Experimental results conducted over an Apache Spark implementation on two real-world RDF datasets emphasize the performance boost obtained with our approach.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121702062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automated ontology-based annotation of scientific literature using deep learning 使用深度学习的基于本体的科学文献自动注释
Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393636
Prashanti Manda, S. SayedAhmed, S. Mohanty
Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.
使用本体表示科学知识可以实现数据集成、一致的机器可读数据表示,并允许大规模的计算分析。为了跟上科学出版的快速发展步伐,文本挖掘方法必须能够用本体概念自动处理和注释科学文献。在这里,我们提出了深度学习模型(门控循环单元(GRU)和长短期记忆(LSTM)),结合不同的输入编码格式,用于文本本体概念的自动命名实体识别(NER)。科罗拉多丰富注释全文(CRAFT)金标准语料库用于训练和测试我们的模型。使用Precision, Recall, F-1和Jaccard语义相似度来评估模型的性能。我们发现基于gru的模型在所有评估指标上都优于LSTM模型。令人惊讶的是,对于每个实例,考虑模型的前两个概率预测,而不是前一个,结果导致准确性大幅提高。通过包容推理包含本体语义产生了适度的性能改进。
{"title":"Automated ontology-based annotation of scientific literature using deep learning","authors":"Prashanti Manda, S. SayedAhmed, S. Mohanty","doi":"10.1145/3391274.3393636","DOIUrl":"https://doi.org/10.1145/3391274.3393636","url":null,"abstract":"Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
QSGG
Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393638
S. Böttcher, Rita Hartel, S. Peeters
Like [1], we present QSGG, an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm, QSGG, works on a compressed graph grammar, instead of on the original graph. The speed-up of QSGG compared to a previous algorithm [1] grows with the size of the graph and with the compression strength of the grammar.
{"title":"QSGG","authors":"S. Böttcher, Rita Hartel, S. Peeters","doi":"10.1145/3391274.3393638","DOIUrl":"https://doi.org/10.1145/3391274.3393638","url":null,"abstract":"Like [1], we present QSGG, an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm, QSGG, works on a compressed graph grammar, instead of on the original graph. The speed-up of QSGG compared to a previous algorithm [1] grows with the size of the graph and with the compression strength of the grammar.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126218929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SustainOnt: an ontology for defining an index of neighborhood sustainability across domains SustainOnt:用于定义跨域邻居可持续性指数的本体
Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393640
Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal
Massive amounts of data, both structured and unstructured, are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications. This paper specifically focuses on extraction, integration, and querying of open data available about environmental sustainability. The global trend toward urbanization has created a need for residents of urban neighborhoods to better understand the factors impacting the social, environmental, and economic sustainability of an area. To date, there is no concise representation of all aspects of sustainability. This paper aims to fill this gap. A version of sustainability resting on economic, societal, and environmental development as the three main indicators was chosen to inform an ontology called SustainOnt used to organize and analyze relevant data from various sources. The newly-linked data is made available through a dual-platform application aimed at reaching a wide array of audiences. An initial prototype has been designed, using data for a small region, to provide a sustainability index of each city and/or neighborhood area that can be more accessible to people without the means to directly analyze the available data.
大量的数据(包括结构化的和非结构化的)可以用于获取竞争业务优势、健全的政府政策和广泛应用程序中的新见解。本文特别关注环境可持续性开放数据的提取、整合和查询。城市化的全球趋势使得城市社区的居民需要更好地了解影响一个地区的社会、环境和经济可持续性的因素。到目前为止,还没有关于可持续性的所有方面的简明表述。本文旨在填补这一空白。一个基于经济、社会和环境发展作为三个主要指标的可持续性版本被选择来通知一个称为SustainOnt的本体,用于组织和分析来自各种来源的相关数据。新链接的数据通过一个双平台应用程序提供,目的是向广泛的受众提供。设计了一个初始原型,使用小区域的数据,为每个城市和/或邻近区域提供可持续性指数,可以更容易地为人们提供直接分析现有数据的手段。
{"title":"SustainOnt: an ontology for defining an index of neighborhood sustainability across domains","authors":"Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal","doi":"10.1145/3391274.3393640","DOIUrl":"https://doi.org/10.1145/3391274.3393640","url":null,"abstract":"Massive amounts of data, both structured and unstructured, are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications. This paper specifically focuses on extraction, integration, and querying of open data available about environmental sustainability. The global trend toward urbanization has created a need for residents of urban neighborhoods to better understand the factors impacting the social, environmental, and economic sustainability of an area. To date, there is no concise representation of all aspects of sustainability. This paper aims to fill this gap. A version of sustainability resting on economic, societal, and environmental development as the three main indicators was chosen to inform an ontology called SustainOnt used to organize and analyze relevant data from various sources. The newly-linked data is made available through a dual-platform application aimed at reaching a wide array of audiences. An initial prototype has been designed, using data for a small region, to provide a sustainability index of each city and/or neighborhood area that can be more accessible to people without the means to directly analyze the available data.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data placement strategies that speed-up distributed graph query processing 加速分布式图查询处理的数据放置策略
Pub Date : 2020-04-05 DOI: 10.1145/3391274.3393633
Daniel Janke, Steffen Staab, Martin Leinberger
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.
研究了在计算节点集群上运行的分布式RDF存储中如何优化数据分布以提高查询性能的问题。当使用基于散列的数据分布策略时,查询工作负载倾向于在所有计算节点之间均衡,而基于图聚类的方法减少了传输中间结果的数量。我们的假设是,将实体放在紧密连接的数据项的小集合中的数据分布策略可能能够结合这两种策略的优点。为了验证这一假设,我们分析了两种这样的数据分布策略:过度分割的最小边缘盖板。2. 我们的新型分子散列盖。我们的分析通过解释他们良好表现的原因来证实我们的假设。这两种策略都减少了我们的测试查询集的查询执行时间(5%到98%之间)。虽然过度分区的最小边缘覆盖效果最好,但当它可以计算时,它可能缺乏大型数据集的可扩展性。我们的新型分子散列覆盖结合了可伸缩性和针对各种基线策略的查询执行时间的重大改进。
{"title":"Data placement strategies that speed-up distributed graph query processing","authors":"Daniel Janke, Steffen Staab, Martin Leinberger","doi":"10.1145/3391274.3393633","DOIUrl":"https://doi.org/10.1145/3391274.3393633","url":null,"abstract":"We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133749048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proceedings of the International Workshop on Semantic Big Data 语义大数据国际研讨会论文集
Pub Date : 2018-06-10 DOI: 10.1145/3208352
{"title":"Proceedings of the International Workshop on Semantic Big Data","authors":"","doi":"10.1145/3208352","DOIUrl":"https://doi.org/10.1145/3208352","url":null,"abstract":"","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the International Workshop on Semantic Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1