A. Schätzle, Antony Neu, G. Lausen, Martin Przyjaciel-Zablocki
{"title":"RDF图的大规模双模拟","authors":"A. Schätzle, Antony Neu, G. Lausen, Martin Przyjaciel-Zablocki","doi":"10.1145/2484712.2484713","DOIUrl":null,"url":null,"abstract":"RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.","PeriodicalId":420849,"journal":{"name":"SWIM '13","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"Large-scale bisimulation of RDF graphs\",\"authors\":\"A. Schätzle, Antony Neu, G. Lausen, Martin Przyjaciel-Zablocki\",\"doi\":\"10.1145/2484712.2484713\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.\",\"PeriodicalId\":420849,\"journal\":{\"name\":\"SWIM '13\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SWIM '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2484712.2484713\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SWIM '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484712.2484713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RDF datasets with billions of triples are no longer unusual and continue to grow constantly (e.g. LOD cloud) driven by the inherent flexibility of RDF that allows to represent very diverse datasets, ranging from highly structured to unstructured data. Because of their size, understanding and processing RDF graphs is often a difficult task and methods to reduce the size while keeping as much of its structural information become attractive. In this paper we study bisimulation as a means to reduce the size of RDF graphs according to structural equivalence. We study two bisimulation algorithms, one for sequential execution using SQL and one for distributed execution using MapReduce. We demonstrate that the MapReduce-based implementation scales linearly with the number of the RDF triples, allowing to compute the bisimulation of very large RDF graphs within a time which is by far not possible for the sequential version. Experiments based on synthetic benchmark data and real data (DBPedia) exhibit a reduction of more than 90% of the size of the RDF graph in terms of the number of nodes to the number of blocks in the resulting bisimulation partition.