SWIM '13最新文献

英文中文

Towards a model for replicating aesthetic literary appreciation 走向复制审美文学欣赏的模式

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484720

T. Crosbie, Timothy French, M. Conrad

This study aims to bridge the gap between subjective literary criticism and natural language processing by creating a model that emulates the results of a survey into literary tastes. A panel of human experts qualified segments of literary text according to how aesthetically pleasing they found them. These segments were then rated for literariness in an open survey using a Likert scale. Each segment was subjected to a parts-of-speech tagger using NLTK and the results compared with those of the survey. Using a Grounded Theory approach, experiments using various combinations of parts-of-speech were carried out in order to build a model that could replicate the results shown in the open survey. The success of this approach confirms the feasibility of using this method to create a more accurate and analytical model of literary criticism involving deeper stylistic markers.

本研究旨在通过创建一个模拟文学品味调查结果的模型，弥合主观文学批评与自然语言处理之间的差距。一个由人类专家组成的小组根据他们对文学文本的审美程度对它们进行了鉴定。然后使用李克特量表在公开调查中对这些片段的文学性进行评分。使用NLTK对每个片段进行词性标注，并将结果与调查结果进行比较。使用扎根理论的方法，使用词类的各种组合进行实验，以建立一个模型，可以复制公开调查中显示的结果。这一方法的成功证实了利用这一方法创建一个更准确、更分析的文学批评模型的可行性，该模型涉及更深层次的风格标记。

引用次数: 5

Social infobuttons: integrating open health data with social data using semantic technology 社交信息按钮:使用语义技术集成开放的健康数据和社交数据

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484718

Xiang Ji, Soon Ae Chun, J. Geller

There is a large amount of free health information available for a patient to address her health concerns. HealthData.gov includes community health datasets at the national, state and community level, readily downloadable. There are also patient-generated datasets, accessible through social media, on the conditions, treatments or side effects that individual patients experience. While caring for patients, clinicians or healthcare providers may benefit from integrated information and knowledge embedded in the open health datasets, such as national health trends and social health trends from patient-generated healthcare experiences. However, the open health datasets are distributed and vary from structured to highly unstructured. An information seeker has to spend time visiting many, possibly irrelevant, websites, and has to select relevant information from each and integrate it into a coherent mental model. In this paper, we present a Linked Data approach to integrating these health data sources and presenting contextually relevant information called Social InfoButtons to healthcare professionals and patients. We present methods of data extraction, and semantic linked data integration and visualization. A Social InfoButtons prototype system provides awareness of community and patient health issues and healthcare trends that may shed light on patient care and health policy decisions.

有大量的免费健康信息可供患者使用，以解决其健康问题。gov包括国家、州和社区一级的社区卫生数据集，可随时下载。也有患者生成的数据集，可以通过社交媒体访问，这些数据集涉及个体患者所经历的病情、治疗或副作用。在照顾患者的同时，临床医生或医疗保健提供者可能受益于嵌入在开放卫生数据集中的综合信息和知识，例如来自患者产生的医疗保健经验的国家卫生趋势和社会卫生趋势。然而，开放的卫生数据集是分布式的，从结构化到高度非结构化各不相同。信息搜寻者必须花费时间访问许多可能不相关的网站，并且必须从每个网站中选择相关信息并将其整合到一个连贯的心理模型中。在本文中，我们提出了一种关联数据方法来集成这些健康数据源，并向医疗保健专业人员和患者提供称为社会信息按钮的上下文相关信息。我们提出了数据提取、语义关联数据集成和可视化的方法。社会信息按钮原型系统提供了对社区和患者健康问题以及医疗保健趋势的认识，这可能有助于患者护理和健康政策决策。

{"title":"Social infobuttons: integrating open health data with social data using semantic technology","authors":"Xiang Ji, Soon Ae Chun, J. Geller","doi":"10.1145/2484712.2484718","DOIUrl":"https://doi.org/10.1145/2484712.2484718","url":null,"abstract":"There is a large amount of free health information available for a patient to address her health concerns. HealthData.gov includes community health datasets at the national, state and community level, readily downloadable. There are also patient-generated datasets, accessible through social media, on the conditions, treatments or side effects that individual patients experience. While caring for patients, clinicians or healthcare providers may benefit from integrated information and knowledge embedded in the open health datasets, such as national health trends and social health trends from patient-generated healthcare experiences. However, the open health datasets are distributed and vary from structured to highly unstructured. An information seeker has to spend time visiting many, possibly irrelevant, websites, and has to select relevant information from each and integrate it into a coherent mental model. In this paper, we present a Linked Data approach to integrating these health data sources and presenting contextually relevant information called Social InfoButtons to healthcare professionals and patients. We present methods of data extraction, and semantic linked data integration and visualization. A Social InfoButtons prototype system provides awareness of community and patient health issues and healthcare trends that may shed light on patient care and health policy decisions.","PeriodicalId":420849,"journal":{"name":"SWIM '13","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129883441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Scalable reconstruction of RDF-archived relational databases rdf归档关系数据库的可伸缩重建

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484717

S. Stefanova, T. Risch

We have investigated approaches for scalable reconstruction of relational databases (RDBs) archived as RDF files. An archived RDB is reconstructed from a data archive file and a schema archive file, both in N-Triples formats. The archives contain RDF triples representing the archived relational data content and the relational schema describing the content, respectively. When an archived RDB is to be reconstructed, the schema archive is first read to automatically create the RDB schema using a schema reconstruction algorithm which identifies RDB elements by queries to the schema archive. The RDB thus created is then populated by reading the data archive. To populate the RDB we have developed two approaches, the naive Insert Attribute Value (IAV) and Triple Bulk Load (TBL). With the IAV approach the data is populated by stored procedures that execute SQL INSERT or UPDATE statements to insert attribute values in the RDB tables. In the more complex TBL approach the database is populated by bulk loading CSV files generated by sorting the data archive triples joined with schema information. Our experiments show that the TBL approach is substantially faster than the IAV approach.

我们已经研究了对作为RDF文件存档的关系数据库(rdb)进行可伸缩重建的方法。归档的RDB是从数据归档文件和模式归档文件重构的，两者都采用N-Triples格式。归档文件包含RDF三元组，分别表示归档的关系数据内容和描述内容的关系模式。当要重构归档的RDB时，首先读取模式归档以使用模式重构算法自动创建RDB模式，该算法通过对模式归档的查询来标识RDB元素。然后通过读取数据存档来填充这样创建的RDB。为了填充RDB，我们开发了两种方法，简单的插入属性值(IAV)和三重批量负载(TBL)。使用IAV方法，数据由执行SQL INSERT或UPDATE语句的存储过程填充，以在RDB表中插入属性值。在更复杂的TBL方法中，数据库是通过批量加载CSV文件来填充的，CSV文件是通过对与模式信息连接的数据存档三元组进行排序而生成的。我们的实验表明，TBL方法比IAV方法快得多。

引用次数: 4

Overcoming limitations of term-based partitioning for distributed RDFS reasoning 克服分布式RDFS推理中基于项划分的限制

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484719

Tugba Kulahcioglu, Hasan Bulut

RDFS reasoning is carried out via joint terms of triples; accordingly, a distributed reasoning approach should bring together triples that have terms in common. To achieve this, term-based partitioning distributes triples to partitions based on the terms they include. However, skewed distribution of Semantic Web data results in unbalanced load distribution. A single peer should be able to handle even the largest partition, and this requirement limits scalability. This approach also suffers from data replication since a triple is sent to multiple partitions. In this paper, we propose a two-step method to overcome above limitations. Our RDFS specific term-based partitioning algorithm applies a selective distribution policy and distributes triples with minimum replication. Our schema-sensitive processing approach eliminates non-productive partitions, and enables processing of a partition regardless of its size. Resulting partitions reach full closure without repeating the global schema or without fix-point iteration as suggested by previous studies.

通过三元组的联合项进行RDFS推理;因此，分布式推理方法应该将具有共同术语的三元组组合在一起。为了实现这一点，基于术语的分区根据它们包含的术语将三元组分发到分区。然而，语义Web数据的倾斜分布导致负载分布不平衡。单个对等点应该能够处理最大的分区，而这一需求限制了可伸缩性。这种方法还会受到数据复制的影响，因为三元组被发送到多个分区。在本文中，我们提出了一种两步法来克服上述限制。我们的RDFS特定的基于项的分区算法应用选择性分布策略，并以最小的复制分布三元组。我们的模式敏感处理方法消除了非生产性分区，并支持处理分区，而不管其大小。结果分区在不重复全局模式或不像以前的研究建议的那样进行定点迭代的情况下达到完全闭包。

引用次数: 2

Scalable containment for unions of conjunctive queries under constraints 约束下的联合查询的可伸缩包容

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484716

G. Konstantinidis, J. Ambite

We consider the problem of query containment under ontological constraints, such as those of RDFS. Query containment, i.e., deciding whether the answers of a given query are always contained in the answers of another query, is an important problem to areas such as database theory and knowledge representation, with applications to data integration, query optimization and minimization. We consider unions of conjunctive queries, which constitute the core of structured query languages, such as SPARQL and SQL. We also consider ontological constraints or axioms, expressed in the language of Tuple-Generating Dependencies. TGDs capture RDF/S and fragments of Description Logics. We consider classes of TGDs for which the chase is known to terminate. Query containment under chase-terminating axioms can be decided by first running the chase on one of the two queries and then rely on classic relational containment. When considering unions of conjunctive queries, classic algorithms for both the chase and containment phases suffer from a large degree of redundancy. We leverage a graph-based modeling of rules, that represents multiple queries in a compact form, by exploiting shared patterns amongst them. As a result we couple the phases of both for chase and regular containment and end up with a faster and more scalable algorithm. Our experiments show a speedup of close to two orders of magnitude.

我们考虑了本体约束下的查询包含问题，例如RDFS的约束。查询包含，即确定给定查询的答案是否总是包含在另一个查询的答案中，是数据库理论和知识表示等领域的一个重要问题，并应用于数据集成、查询优化和最小化。我们考虑连接查询的联合，它构成了结构化查询语言的核心，如SPARQL和SQL。我们还考虑了用元组生成依赖关系语言表达的本体论约束或公理。tgd捕获RDF/S和描述逻辑的片段。我们考虑追逐已知终止的tgd类。在追逐终止公理下的查询包含可以通过首先在两个查询中的一个上运行追逐来确定，然后依赖于经典的关系包含。当考虑合取查询的联合时，用于追逐和包含阶段的经典算法都存在很大程度的冗余。我们利用基于图的规则建模，通过利用它们之间的共享模式，以紧凑的形式表示多个查询。因此，我们将追逐和常规遏制这两个阶段结合起来，最终得到一个更快、更可扩展的算法。我们的实验显示了接近两个数量级的加速。

{"title":"Scalable containment for unions of conjunctive queries under constraints","authors":"G. Konstantinidis, J. Ambite","doi":"10.1145/2484712.2484716","DOIUrl":"https://doi.org/10.1145/2484712.2484716","url":null,"abstract":"We consider the problem of query containment under ontological constraints, such as those of RDFS. Query containment, i.e., deciding whether the answers of a given query are always contained in the answers of another query, is an important problem to areas such as database theory and knowledge representation, with applications to data integration, query optimization and minimization. We consider unions of conjunctive queries, which constitute the core of structured query languages, such as SPARQL and SQL. We also consider ontological constraints or axioms, expressed in the language of Tuple-Generating Dependencies. TGDs capture RDF/S and fragments of Description Logics. We consider classes of TGDs for which the chase is known to terminate. Query containment under chase-terminating axioms can be decided by first running the chase on one of the two queries and then rely on classic relational containment. When considering unions of conjunctive queries, classic algorithms for both the chase and containment phases suffer from a large degree of redundancy. We leverage a graph-based modeling of rules, that represents multiple queries in a compact form, by exploiting shared patterns amongst them. As a result we couple the phases of both for chase and regular containment and end up with a faster and more scalable algorithm. Our experiments show a speedup of close to two orders of magnitude.","PeriodicalId":420849,"journal":{"name":"SWIM '13","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131569680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Semantic description of OData services OData服务的语义描述

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484714

M. Kirchhoff, K. Geihs

The Open Data Protocol (OData) is a data access protocol that is based on the REST principles. It is built upon existing and well-known technologies such as HTTP, AtomPub and JSON. OData is already widely-used in the industry. Many IT companies provide OData interfaces for their applications. The structure of the data that is provided by an OData service is described with the Conceptual Schema Definition Language (CSDL). To make this data available for the integration with the Semantic Web, we propose to semantically annotate CSDL-documents. This extension of CSDL allows the definition of mappings from the underlying Entity Data Model (EDM) to RDF graphs which is a first step towards the implementation of a SPARQL endpoint on top of existing OData services. Based on the OData interfaces of existing enterprise resource planning (ERP) systems, it is possible to realize a SPARQL endpoint for those systems which can lead to a great simplification in the retrieval of data.

开放数据协议(OData)是一种基于REST原则的数据访问协议。它建立在HTTP、AtomPub和JSON等现有的知名技术之上。OData已经在业界得到了广泛的应用。许多IT公司为其应用程序提供OData接口。OData服务提供的数据结构是用概念模式定义语言(CSDL)描述的。为了使这些数据可用于与语义Web的集成，我们建议对csdl文档进行语义注释。CSDL的这个扩展允许定义从底层实体数据模型(EDM)到RDF图的映射，这是在现有OData服务之上实现SPARQL端点的第一步。基于现有企业资源规划(ERP)系统的OData接口，可以为这些系统实现SPARQL端点，这可以极大地简化数据检索。

引用次数: 6

Large-scale bisimulation of RDF graphs RDF图的大规模双模拟

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484713

A. Schätzle, Antony Neu, G. Lausen, Martin Przyjaciel-Zablocki

RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.

拥有数十亿个三元组的RDF数据集不再罕见，并且在RDF固有的灵活性的驱动下不断增长(例如LOD云)，RDF允许表示非常多样化的数据集，从高度结构化到非结构化数据。由于RDF图的大小，理解和处理RDF图通常是一项困难的任务，而减少其大小同时保留尽可能多的结构信息的方法变得有吸引力。本文研究了基于结构等价的双模拟方法来减少RDF图的大小。我们研究了两种双仿真算法，一种用于使用SQL进行顺序执行，另一种用于使用MapReduce进行分布式执行。我们演示了基于mapreduce的实现随着RDF三元组的数量线性扩展，允许在一段时间内计算非常大的RDF图的双模拟，这对于顺序版本来说是迄今为止不可能的。基于合成基准数据和真实数据(DBPedia)的实验显示，就所得双模拟分区中的节点数量到块数量而言，RDF图的大小减少了90%以上。

引用次数: 31

LOP: capturing and linking open provenance on LOD cycle LOP:捕获和连接LOD循环上的开放物源

SWIM '13

Pub Date : 2013-06-23 DOI: 10.1145/2484712.2484715

Rogers Reiche de Mendonça, Sérgio Manuel Serra da Cruz, Jonas F. S. M. de La Cerda, M. C. Cavalcanti, K. F. Cordeiro, M. Campos

The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.

数据Web已经成为一种公开、共享、重用和连接由uri标识的Web上的信息的方法，使用RDF作为数据模型，遵循关联数据原则。然而，如果没有适当的数据质量评估，第三方数据的重用可能会受到影响。在这种情况下，重要的问题出现了:人们如何信任已发布的数据和链接?在发布之前，对数据进行了哪些操作、修改和集成操作?在互连过程中应用于数据的比较或转换的本质是什么?在这种情况下，来源成为一个基本要素。在本文中，我们描述了一种生成和捕获链接开放来源(LOP)的方法，以支持数据质量和可信度评估，其中包括传统数据源的准备和格式转换，直至数据集发布和互连。所建议的体系结构利用由ETL工作流方法编排的起源代理来收集任何指定级别的起源，并将其与其相应的数据链接起来。我们还描述了一个真实的用例场景，其中实现了体系结构以评估提案。

{"title":"LOP: capturing and linking open provenance on LOD cycle","authors":"Rogers Reiche de Mendonça, Sérgio Manuel Serra da Cruz, Jonas F. S. M. de La Cerda, M. C. Cavalcanti, K. F. Cordeiro, M. Campos","doi":"10.1145/2484712.2484715","DOIUrl":"https://doi.org/10.1145/2484712.2484715","url":null,"abstract":"The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.","PeriodicalId":420849,"journal":{"name":"SWIM '13","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

SWIM '13

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀