首页 > 最新文献

PhD '12最新文献

英文 中文
Linking records in dynamic world 链接动态世界中的记录
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213612
Pei Li
In real-world, entities change dynamically and the changes are capture in two dimensions: time and space. For data sets that contain temporal records, where each record is associated with a time stamp and describes some aspects of a real-world entity at that particular time, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. For data sets that contain geographically referenced data describing real-world entities at different locations (i.e., location entities), we wish to link those entities that belong to the same organization or network. However, existing record linkage techniques ignore additional evidence in temporal and spatial data and can fall short for these cases. This proposal studies linking temporal and spatial records. For temporal record linkage, we apply time decay to capture the effect of elapsed time on entity value evolution, and propose clustering methods that consider time order of records in clustering. For linking location records, we distinguish between strong and weak evidence; for the former, we study core generation in presence of erroneous data, and then leverage the discovered strong evidence to make remaining decisions.
在现实世界中,实体是动态变化的,这些变化是在两个维度中捕获的:时间和空间。对于包含时间记录的数据集,其中每个记录都与时间戳相关联,并描述了特定时间真实世界实体的某些方面,我们通常希望识别描述同一实体随时间变化的记录,以便能够进行有趣的纵向数据分析。对于包含描述不同位置的现实世界实体的地理参考数据的数据集(即,位置实体),我们希望链接属于同一组织或网络的那些实体。然而,现有的记录联系技术忽略了时间和空间数据中的额外证据,可能无法满足这些情况。本建议研究将时空记录联系起来。对于时间记录链接,我们使用时间衰减来捕捉运行时间对实体值演化的影响,并提出了在聚类中考虑记录时间顺序的聚类方法。在联系地点记录时,我们区分有力证据和弱证据;对于前者,我们在存在错误数据的情况下研究核心生成,然后利用发现的有力证据做出剩余的决策。
{"title":"Linking records in dynamic world","authors":"Pei Li","doi":"10.1145/2213598.2213612","DOIUrl":"https://doi.org/10.1145/2213598.2213612","url":null,"abstract":"In real-world, entities change dynamically and the changes are capture in two dimensions: time and space. For data sets that contain temporal records, where each record is associated with a time stamp and describes some aspects of a real-world entity at that particular time, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. For data sets that contain geographically referenced data describing real-world entities at different locations (i.e., location entities), we wish to link those entities that belong to the same organization or network. However, existing record linkage techniques ignore additional evidence in temporal and spatial data and can fall short for these cases.\u0000 This proposal studies linking temporal and spatial records. For temporal record linkage, we apply time decay to capture the effect of elapsed time on entity value evolution, and propose clustering methods that consider time order of records in clustering. For linking location records, we distinguish between strong and weak evidence; for the former, we study core generation in presence of erroneous data, and then leverage the discovered strong evidence to make remaining decisions.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131969375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Holistic indexing: offline, online and adaptive indexing in the same kernel 整体索引:离线,在线和自适应索引在同一个内核
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213604
E. Petraki
Proper physical design is a momentous issue for the performance of modern database systems and applications. Nowadays, a growing amount of applications require the execution of dynamic and exploratory workloads with unpredictable characteristics that change over time, e.g., social networks, scientific databases and multimedia databases. In addition, as most modern applications move to the big data era, investing time and resources in building the wrong set of indexes over large collections of data can severely affect performance. Offline, online and adaptive indexing are three distinct approaches to the problem of automating the physical design choices. Offline indexing is best in static environments with stable workloads. Online indexing is best in relatively dynamic environments where the query workload can be monitored. Adaptive indexing is best in fully dynamic environments where no idle time or workload knowledge may be assumed. We observe that these three approaches are complementary, while none of them can satisfy the needs of modern applications in isolation. We envision a new index selection approach, holistic indexing that excels its predecessors by combining the best features of offline, online and adaptive indexing while overcoming their weaknesses. The main goal is the creation of a database kernel that can autonomously create partial indexes which are continuously refined during query processing as in adaptive indexing but at the same time the system continuously detects any opportunity to improve the physical design offline; whenever any idle time occurs it tries to exploit knowledge gathered during query processing to refine existing indexes further or create new ones. We sketch the research space and the new challenges such a direction brings.
正确的物理设计对于现代数据库系统和应用程序的性能来说是一个重要的问题。如今,越来越多的应用程序需要执行动态和探索性工作负载,这些工作负载具有随时间变化的不可预测特征,例如社交网络、科学数据库和多媒体数据库。此外,随着大多数现代应用程序进入大数据时代,在大型数据集合上构建错误的索引集所花费的时间和资源可能会严重影响性能。离线、在线和自适应索引是解决物理设计选择自动化问题的三种不同方法。离线索引最适合工作负载稳定的静态环境。在线索引在可以监控查询工作负载的相对动态环境中是最好的。自适应索引在完全动态的环境中是最好的,在这种环境中没有空闲时间或工作负载知识。我们注意到,这三种方法是互补的,但它们都不能单独满足现代应用的需要。我们设想了一种新的索引选择方法,即综合了离线索引、在线索引和自适应索引的优点,同时克服了它们的缺点,从而超越了其前身的整体索引。主要目标是创建一个数据库内核,它可以自主创建部分索引,这些索引在查询处理过程中不断改进,就像自适应索引一样,但同时系统会不断检测任何离线改进物理设计的机会;无论何时出现空闲时间,它都会尝试利用在查询处理期间收集的知识来进一步优化现有索引或创建新索引。概述了这一方向带来的研究空间和新挑战。
{"title":"Holistic indexing: offline, online and adaptive indexing in the same kernel","authors":"E. Petraki","doi":"10.1145/2213598.2213604","DOIUrl":"https://doi.org/10.1145/2213598.2213604","url":null,"abstract":"Proper physical design is a momentous issue for the performance of modern database systems and applications. Nowadays, a growing amount of applications require the execution of dynamic and exploratory workloads with unpredictable characteristics that change over time, e.g., social networks, scientific databases and multimedia databases. In addition, as most modern applications move to the big data era, investing time and resources in building the wrong set of indexes over large collections of data can severely affect performance.\u0000 Offline, online and adaptive indexing are three distinct approaches to the problem of automating the physical design choices. Offline indexing is best in static environments with stable workloads. Online indexing is best in relatively dynamic environments where the query workload can be monitored. Adaptive indexing is best in fully dynamic environments where no idle time or workload knowledge may be assumed. We observe that these three approaches are complementary, while none of them can satisfy the needs of modern applications in isolation.\u0000 We envision a new index selection approach, holistic indexing that excels its predecessors by combining the best features of offline, online and adaptive indexing while overcoming their weaknesses. The main goal is the creation of a database kernel that can autonomously create partial indexes which are continuously refined during query processing as in adaptive indexing but at the same time the system continuously detects any opportunity to improve the physical design offline; whenever any idle time occurs it tries to exploit knowledge gathered during query processing to refine existing indexes further or create new ones. We sketch the research space and the new challenges such a direction brings.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127504183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clustering techniques for open relation extraction 开放关系提取的聚类技术
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213607
F. Mesquita
This work investigates clustering techniques for Relation Extraction (RE). Relation Extraction is the task of extracting relationships among named entities (e.g., people, organizations and geo-political entities) from natural language text. We are particularly interested in the open RE scenario, where the number of target relations is too large or even unknown. Our contributions are in two aspects of the clustering process: (1) extraction and weighting of features and (2) scalability. In order to evaluate our techniques in large scale, we propose an automatic evaluation method based on pointwise mutual information. Our preliminary results show that our clustering techniques as well as our evaluation method are promising.
本文研究了关系抽取(RE)中的聚类技术。关系抽取是指从自然语言文本中抽取命名实体(如人、组织和地缘政治实体)之间的关系。我们对开放的RE场景特别感兴趣,其中目标关系的数量太大甚至未知。我们的贡献在聚类过程的两个方面:(1)特征的提取和加权;(2)可伸缩性。为了对我们的技术进行大规模的评价,我们提出了一种基于点互信息的自动评价方法。我们的初步结果表明,我们的聚类技术和我们的评价方法是有前途的。
{"title":"Clustering techniques for open relation extraction","authors":"F. Mesquita","doi":"10.1145/2213598.2213607","DOIUrl":"https://doi.org/10.1145/2213598.2213607","url":null,"abstract":"This work investigates clustering techniques for Relation Extraction (RE). Relation Extraction is the task of extracting relationships among named entities (e.g., people, organizations and geo-political entities) from natural language text. We are particularly interested in the open RE scenario, where the number of target relations is too large or even unknown. Our contributions are in two aspects of the clustering process: (1) extraction and weighting of features and (2) scalability. In order to evaluate our techniques in large scale, we propose an automatic evaluation method based on pointwise mutual information. Our preliminary results show that our clustering techniques as well as our evaluation method are promising.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126287120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An adaptive event stream processing environment 自适应事件流处理环境
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213613
Samujjwal Bhandari
With the increasing application of Event Stream Processing (ESP) for event pattern detection, it has become important to enhance the extant ESP capabilities to deal with applications having dynamic behavior. This dissertation research explores the limitations of current ESP systems due to fixed pattern detection mechanism and discusses the motivational ideas that demand enhancements in ESP. We propose a solution called adaptive ESP that explores, learns, and updates evolving patterns in dynamic applications. Development of adaptive ESP requires several research issues to be addressed: such as handling input data streams, enhancing event languages with probabilistic information, using machine learning algorithms, and processing feedback from experts. We discuss these issues with the proposed architecture for the system and explore research issues and some of the initial work for developing adaptive ESP.
随着事件流处理(ESP)在事件模式检测中的应用越来越多,增强现有的ESP能力以处理具有动态行为的应用程序变得非常重要。本文探讨了当前ESP系统由于固定模式检测机制而存在的局限性,并讨论了需要对ESP进行改进的动机思想。我们提出了一种称为自适应ESP的解决方案,该解决方案可以在动态应用中探索、学习和更新不断发展的模式。自适应ESP的发展需要解决几个研究问题:例如处理输入数据流,利用概率信息增强事件语言,使用机器学习算法,以及处理专家的反馈。我们讨论了这些问题,提出了系统架构,并探讨了研究问题和开发自适应ESP的一些初步工作。
{"title":"An adaptive event stream processing environment","authors":"Samujjwal Bhandari","doi":"10.1145/2213598.2213613","DOIUrl":"https://doi.org/10.1145/2213598.2213613","url":null,"abstract":"With the increasing application of Event Stream Processing (ESP) for event pattern detection, it has become important to enhance the extant ESP capabilities to deal with applications having dynamic behavior. This dissertation research explores the limitations of current ESP systems due to fixed pattern detection mechanism and discusses the motivational ideas that demand enhancements in ESP. We propose a solution called adaptive ESP that explores, learns, and updates evolving patterns in dynamic applications. Development of adaptive ESP requires several research issues to be addressed: such as handling input data streams, enhancing event languages with probabilistic information, using machine learning algorithms, and processing feedback from experts. We discuss these issues with the proposed architecture for the system and explore research issues and some of the initial work for developing adaptive ESP.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127316900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High performance spatial query processing for large scale scientific data 面向大规模科学数据的高性能空间查询处理
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213603
Ablimit Aji, Fusheng Wang
Analyzing and querying large volumes of spatially derived data from scientific experiments has posed major challenges in the past decade. For example, the systematic analysis of imaged pathology specimens result in rich spatially derived information with GIS characteristics at cellular and sub-cellular scales, with nearly a million derived markups and hundred million features per image. This provides critical information for evaluation of experimental results, support of biomedical studies and pathology image based diagnosis. However, the vast amount of spatially oriented morphological information poses major challenges for analytical medical imaging. The major challenges I attack include: i) How can we provide cost effective, scalable spatial query support for medical imaging GIS? ii) How can we provide fast response queries on analytical imaging data to support biomedical research and clinical diagnosis? and iii) How can we provide expressive queries to support spatial queries and spatial pattern discoveries for end users? In my thesis, I work towards developing a MapReduce based framework MIGIS to support expressive, cost effective and high performance spatial queries. The framework includes a real-time spatial query engine RESQUE consisting of a variety of optimized access methods, boundary and density aware spatial data partitioning, a declarative query language interface, a query translator which automates translation of the spatial queries into MapReduce programs and an execution engine which parallelizes and executes queries on Hadoop. Our preliminary experiments demonstrate that MIGIS is a cost effective architecture which achieves high performance spatial query execution. MIGIS is extensible and can be adapted to support similar complex spatial queries for large scale spatial data in other scientific domains.
分析和查询来自科学实验的大量空间衍生数据在过去十年中提出了重大挑战。例如,对病理标本成像的系统分析产生了丰富的空间衍生信息,具有细胞和亚细胞尺度的GIS特征,每张图像有近一百万个衍生标记和上亿个特征。这为评估实验结果、支持生物医学研究和基于病理图像的诊断提供了关键信息。然而,大量面向空间的形态学信息对分析医学成像提出了重大挑战。我所面临的主要挑战包括:I)我们如何为医学成像GIS提供具有成本效益、可扩展的空间查询支持?ii)如何对分析成像数据提供快速响应查询,以支持生物医学研究和临床诊断?iii)我们如何为最终用户提供表达性查询来支持空间查询和空间模式发现?在我的论文中,我致力于开发一个基于MapReduce的框架MIGIS,以支持表达性、成本效益和高性能的空间查询。该框架包括一个实时空间查询引擎RESQUE,该引擎包含多种优化的访问方法、边界和密度感知的空间数据分区、声明式查询语言接口、查询翻译器(将空间查询自动转换为MapReduce程序)和一个执行引擎(在Hadoop上并行执行查询)。初步实验表明,MIGIS是一种经济高效的空间查询执行体系结构。MIGIS是可扩展的,可以适应支持其他科学领域的大规模空间数据的类似复杂空间查询。
{"title":"High performance spatial query processing for large scale scientific data","authors":"Ablimit Aji, Fusheng Wang","doi":"10.1145/2213598.2213603","DOIUrl":"https://doi.org/10.1145/2213598.2213603","url":null,"abstract":"Analyzing and querying large volumes of spatially derived data from scientific experiments has posed major challenges in the past decade. For example, the systematic analysis of imaged pathology specimens result in rich spatially derived information with GIS characteristics at cellular and sub-cellular scales, with nearly a million derived markups and hundred million features per image. This provides critical information for evaluation of experimental results, support of biomedical studies and pathology image based diagnosis. However, the vast amount of spatially oriented morphological information poses major challenges for analytical medical imaging. The major challenges I attack include: i) How can we provide cost effective, scalable spatial query support for medical imaging GIS? ii) How can we provide fast response queries on analytical imaging data to support biomedical research and clinical diagnosis? and iii) How can we provide expressive queries to support spatial queries and spatial pattern discoveries for end users? In my thesis, I work towards developing a MapReduce based framework MIGIS to support expressive, cost effective and high performance spatial queries. The framework includes a real-time spatial query engine RESQUE consisting of a variety of optimized access methods, boundary and density aware spatial data partitioning, a declarative query language interface, a query translator which automates translation of the spatial queries into MapReduce programs and an execution engine which parallelizes and executes queries on Hadoop. Our preliminary experiments demonstrate that MIGIS is a cost effective architecture which achieves high performance spatial query execution. MIGIS is extensible and can be adapted to support similar complex spatial queries for large scale spatial data in other scientific domains.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129247116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Efficient optimization and processing for distributed monitoring and control applications 分布式监控应用的高效优化和处理
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213615
Mengmeng Liu
In recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, that aim to monitor, control, and make decisions over large volumes of dynamic data. In my dissertation, we aim to enable a generic framework for these distributed monitoring and control applications, and address the limitations of prior work such as data stream management systems and adaptive query processing systems. In particular, we make the following contributions: 1) supporting the maintenance of recursive queries over distributed data streams, 2) enabling full-fledged cost-based incremental query re-optimization, and 3) as ongoing work, incorporating the cost estimation of plan switching during query re-optimization. Our solutions are implemented and evaluated using our prototype system Aspen, over a variety of workloads and benchmarks. In addition, our prototype system Aspen enables an end-to-end framework to support control and decision-making over integrated data streams from both the physical world (e.g., sensor streams) and the digital world (e.g., web, streams, databases).
近年来,我们在网络、传感器网络、云计算和环境监测方面看到了越来越多的应用,其目的是对大量动态数据进行监测、控制和决策。在我的论文中,我们的目标是为这些分布式监视和控制应用程序启用一个通用框架,并解决先前工作(如数据流管理系统和自适应查询处理系统)的局限性。特别是,我们做出了以下贡献:1)支持对分布式数据流的递归查询的维护,2)支持全面的基于成本的增量查询重新优化,以及3)作为正在进行的工作,在查询重新优化期间合并计划切换的成本估算。我们的解决方案是使用我们的原型系统Aspen在各种工作负载和基准测试中实现和评估的。此外,我们的原型系统Aspen使端到端框架能够支持对来自物理世界(例如,传感器流)和数字世界(例如,web,流,数据库)的集成数据流的控制和决策。
{"title":"Efficient optimization and processing for distributed monitoring and control applications","authors":"Mengmeng Liu","doi":"10.1145/2213598.2213615","DOIUrl":"https://doi.org/10.1145/2213598.2213615","url":null,"abstract":"In recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, that aim to monitor, control, and make decisions over large volumes of dynamic data. In my dissertation, we aim to enable a generic framework for these distributed monitoring and control applications, and address the limitations of prior work such as data stream management systems and adaptive query processing systems. In particular, we make the following contributions: 1) supporting the maintenance of recursive queries over distributed data streams, 2) enabling full-fledged cost-based incremental query re-optimization, and 3) as ongoing work, incorporating the cost estimation of plan switching during query re-optimization. Our solutions are implemented and evaluated using our prototype system Aspen, over a variety of workloads and benchmarks. In addition, our prototype system Aspen enables an end-to-end framework to support control and decision-making over integrated data streams from both the physical world (e.g., sensor streams) and the digital world (e.g., web, streams, databases).","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116673677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
RecDB: towards DBMS support for online recommender systems RecDB:面向在线推荐系统的DBMS支持
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213608
Mohamed Sarwat
Recommender systems have become popular in both commercial and academic settings. The main purpose of recommender systems is to suggest to users useful and interesting items or content (data) from a considerably large set of items. Traditional recommender systems do not take into account system issues (i.e., scalability and query efficiency). In an age of staggering web use growth and everpopular social media applications (e.g., Facebook, Google Reader), users are expressing their opinions over a diverse set of data (e.g., news stories, Facebook posts, retail purchases) faster than ever. In this paper, we propose RecDB; a fully fledged database system that provides online recommendation to users. We implement RecDB using existing open source database system Apache Derby, and we use showcase the effectiveness of RecDB by adopting inside Sindbad; a Location-Based Social Networking system developed at University of Minnesota.
推荐系统在商业和学术环境中都很流行。推荐系统的主要目的是从相当大的一组项目中向用户推荐有用和有趣的项目或内容(数据)。传统的推荐系统没有考虑到系统问题(即可扩展性和查询效率)。在一个网络使用增长惊人、社交媒体应用(如Facebook、Google Reader)日益流行的时代,用户通过各种各样的数据(如新闻故事、Facebook帖子、零售购买)比以往任何时候都更快地表达自己的观点。本文提出了RecDB;一个成熟的数据库系统,为用户提供在线推荐。我们使用现有的开源数据库系统Apache Derby来实现RecDB,并通过在Sindbad内部采用RecDB来展示其有效性;一个由明尼苏达大学开发的基于位置的社交网络系统。
{"title":"RecDB: towards DBMS support for online recommender systems","authors":"Mohamed Sarwat","doi":"10.1145/2213598.2213608","DOIUrl":"https://doi.org/10.1145/2213598.2213608","url":null,"abstract":"Recommender systems have become popular in both commercial and academic settings. The main purpose of recommender systems is to suggest to users useful and interesting items or content (data) from a considerably large set of items. Traditional recommender systems do not take into account system issues (i.e., scalability and query efficiency). In an age of staggering web use growth and everpopular social media applications (e.g., Facebook, Google Reader), users are expressing their opinions over a diverse set of data (e.g., news stories, Facebook posts, retail purchases) faster than ever. In this paper, we propose RecDB; a fully fledged database system that provides online recommendation to users. We implement RecDB using existing open source database system Apache Derby, and we use showcase the effectiveness of RecDB by adopting inside Sindbad; a Location-Based Social Networking system developed at University of Minnesota.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127374240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Foundational aspects of semantic web optimization 语义网页优化的基本方面
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213611
Sebastian Skritek
The goal of the semantic web is to make the information available on the web easier accessible. Its idea is to provide machine readable meta-data to enable the development of tools that support users in finding the relevant data. The goal of the thesis is to shed some light onto different foundational aspects of optimization tasks occurring in the field of the Semantic Web. Examples include the redundancy elimination in RDF data or static query analysis of (well-designed) SPARQL queries. Towards this goal, we already contributed several results.
语义网的目标是使网络上可用的信息更容易访问。它的思想是提供机器可读的元数据,以支持开发支持用户查找相关数据的工具。本文的目标是阐明语义网领域中出现的优化任务的不同基本方面。示例包括RDF数据中的冗余消除或(精心设计的)SPARQL查询的静态查询分析。为了实现这一目标,我们已经取得了一些成果。
{"title":"Foundational aspects of semantic web optimization","authors":"Sebastian Skritek","doi":"10.1145/2213598.2213611","DOIUrl":"https://doi.org/10.1145/2213598.2213611","url":null,"abstract":"The goal of the semantic web is to make the information available on the web easier accessible. Its idea is to provide machine readable meta-data to enable the development of tools that support users in finding the relevant data.\u0000 The goal of the thesis is to shed some light onto different foundational aspects of optimization tasks occurring in the field of the Semantic Web. Examples include the redundancy elimination in RDF data or static query analysis of (well-designed) SPARQL queries. Towards this goal, we already contributed several results.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125300240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data quality and integration in collaborative environments 协作环境中的数据质量和集成
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213606
Gregor Endler
The trend to merge medical practices into cooperatively operating networks and organizational units like Medical Supply Centers generates new challenges for an adequate IT support. In particular, new use cases for common economic planning, controlling and treatment coordination arise. This requires consolidation of data originating from heterogeneous and autonomous software systems. Heterogeneity and autonomy are core reasons for low data quality. The intuitive approach of initially integrating heterogeneous systems into a federated system creates a very high upfront effort before the system can become operable and does not adequately consider the fact that data quality requirements might change over time. To remedy this, we propose an approach for continuous data quality improvement which enables a demand driven step by step system integration. By adapting the generic Total Data Quality Management process to healthcare specific use cases, we are developing an extended model for continuous data quality management in cooperative healthcare settings. The IT tools which are needed to provide the information that drives this process are currently in development within a government supported project involving both industry and academia.
将医疗实践合并到合作运营网络和组织单位(如医疗供应中心)的趋势为充分的IT支持带来了新的挑战。特别是,出现了公共经济计划、控制和处理协调的新用例。这需要整合来自异构和自治软件系统的数据。异质性和自主性是导致数据质量低下的核心原因。最初将异构系统集成到联邦系统中的直观方法在系统变得可操作之前会产生非常高的前期工作,并且没有充分考虑数据质量需求可能随时间变化的事实。为了解决这个问题,我们提出了一种持续数据质量改进的方法,使需求驱动的逐步系统集成成为可能。通过将通用的总体数据质量管理流程适应于医疗保健特定用例,我们正在开发一个扩展模型,用于在合作医疗保健环境中进行持续的数据质量管理。提供驱动这一过程的信息所需的IT工具目前正在政府支持的一个涉及工业界和学术界的项目中开发。
{"title":"Data quality and integration in collaborative environments","authors":"Gregor Endler","doi":"10.1145/2213598.2213606","DOIUrl":"https://doi.org/10.1145/2213598.2213606","url":null,"abstract":"The trend to merge medical practices into cooperatively operating networks and organizational units like Medical Supply Centers generates new challenges for an adequate IT support. In particular, new use cases for common economic planning, controlling and treatment coordination arise. This requires consolidation of data originating from heterogeneous and autonomous software systems. Heterogeneity and autonomy are core reasons for low data quality. The intuitive approach of initially integrating heterogeneous systems into a federated system creates a very high upfront effort before the system can become operable and does not adequately consider the fact that data quality requirements might change over time. To remedy this, we propose an approach for continuous data quality improvement which enables a demand driven step by step system integration. By adapting the generic Total Data Quality Management process to healthcare specific use cases, we are developing an extended model for continuous data quality management in cooperative healthcare settings. The IT tools which are needed to provide the information that drives this process are currently in development within a government supported project involving both industry and academia.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127388030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards an extensible efficient event processing kernel 一个可扩展的高效事件处理内核
Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213602
Mohammad Sadoghi
The efficient processing of large collections of patterns (Boolean expressions, XPath queries, or continuous SQL queries) over data streams plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. On the one hand, emerging user-centric applications, including computational advertising and selective information dissemination, demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target (mobile) devices. We achieve these user-centric requirements through novel high-dimensional indexing structures and (parallel) algorithms. On the other hand, applications in real-time data analysis, including computational finance and intrusion detection, demand meeting stringent subsecond processing requirements and providing high-frequency and low-latency event processing over data streams. We achieve real-time data analysis requirements by leveraging reconfigurable hardware -- FPGAs -- to sustain line-rate processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific, and low-level logic design. Finally, we conduct a comprehensive evaluation to demonstrate the superiority of our proposed techniques in comparison with state-of-the-art algorithms designed for event processing.
对数据流上的大量模式(布尔表达式、XPath查询或连续SQL查询)的高效处理在从以用户为中心的处理和个性化到实时数据分析的主要数据密集型应用程序中起着核心作用。一方面,新兴的以用户为中心的应用,包括计算广告和选择性信息传播,要求确定并向最终用户呈现最相关的内容,这些内容既是用户可消费的,又是适合目标(移动)设备有限的屏幕空间的。我们通过新颖的高维索引结构和(并行)算法来实现这些以用户为中心的需求。另一方面,实时数据分析的应用,包括计算金融和入侵检测,需要满足严格的亚秒级处理要求,并在数据流上提供高频和低延迟的事件处理。我们通过利用可重构硬件(fpga)来实现实时数据分析需求,通过利用前所未有的并行度和流水线潜力来维持线速率处理,只有通过定制的、特定于应用程序的低级逻辑设计才能实现。最后,我们进行了全面的评估,以证明我们提出的技术与为事件处理设计的最先进算法相比的优越性。
{"title":"Towards an extensible efficient event processing kernel","authors":"Mohammad Sadoghi","doi":"10.1145/2213598.2213602","DOIUrl":"https://doi.org/10.1145/2213598.2213602","url":null,"abstract":"The efficient processing of large collections of patterns (Boolean expressions, XPath queries, or continuous SQL queries) over data streams plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. On the one hand, emerging user-centric applications, including computational advertising and selective information dissemination, demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target (mobile) devices. We achieve these user-centric requirements through novel high-dimensional indexing structures and (parallel) algorithms. On the other hand, applications in real-time data analysis, including computational finance and intrusion detection, demand meeting stringent subsecond processing requirements and providing high-frequency and low-latency event processing over data streams. We achieve real-time data analysis requirements by leveraging reconfigurable hardware -- FPGAs -- to sustain line-rate processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific, and low-level logic design. Finally, we conduct a comprehensive evaluation to demonstrate the superiority of our proposed techniques in comparison with state-of-the-art algorithms designed for event processing.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"221 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122930545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
PhD '12
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1