Michael Smit, Mark Shtern, B. Simmons, Marin Litoiu
{"title":"增强数据即服务生态系统","authors":"Michael Smit, Mark Shtern, B. Simmons, Marin Litoiu","doi":"10.1109/SERVICES.2013.53","DOIUrl":null,"url":null,"abstract":"The sharing of large and interesting Big Data in cloud environments can be achieved using data-as-a-service, where a provider offers data to interested users. In enhanced data-as-a-service, the data provider also supplies compute infrastructure, allowing users to run analytics tasks local to the data and reducing the (expensive and slow) transmission of data over networks. This paper describes a services-based ecosystem that allows providers to precisely share portions of their data with users, using a model where users submit MapReduce jobs that run on the provider's Hadoop infrastructure. Providers are given mechanisms to filter, segment, and/or transform data before it reaches the user's task. The ecosystem also allows for intermediaries who offer value-added filtrations, segmentations, or transformations of the data (for example, pre-filtering a dataset to only include high-income users). We describe the RESTful services required to enable this ecosystem, introduce a prototype to demonstrate the concept, and present experiments using this ecosystem to both provide and analyze different segments of a single large data set.","PeriodicalId":169370,"journal":{"name":"2013 IEEE Ninth World Congress on Services","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Enabling an Enhanced Data-as-a-Service Ecosystem\",\"authors\":\"Michael Smit, Mark Shtern, B. Simmons, Marin Litoiu\",\"doi\":\"10.1109/SERVICES.2013.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The sharing of large and interesting Big Data in cloud environments can be achieved using data-as-a-service, where a provider offers data to interested users. In enhanced data-as-a-service, the data provider also supplies compute infrastructure, allowing users to run analytics tasks local to the data and reducing the (expensive and slow) transmission of data over networks. This paper describes a services-based ecosystem that allows providers to precisely share portions of their data with users, using a model where users submit MapReduce jobs that run on the provider's Hadoop infrastructure. Providers are given mechanisms to filter, segment, and/or transform data before it reaches the user's task. The ecosystem also allows for intermediaries who offer value-added filtrations, segmentations, or transformations of the data (for example, pre-filtering a dataset to only include high-income users). We describe the RESTful services required to enable this ecosystem, introduce a prototype to demonstrate the concept, and present experiments using this ecosystem to both provide and analyze different segments of a single large data set.\",\"PeriodicalId\":169370,\"journal\":{\"name\":\"2013 IEEE Ninth World Congress on Services\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Ninth World Congress on Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERVICES.2013.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Ninth World Congress on Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERVICES.2013.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The sharing of large and interesting Big Data in cloud environments can be achieved using data-as-a-service, where a provider offers data to interested users. In enhanced data-as-a-service, the data provider also supplies compute infrastructure, allowing users to run analytics tasks local to the data and reducing the (expensive and slow) transmission of data over networks. This paper describes a services-based ecosystem that allows providers to precisely share portions of their data with users, using a model where users submit MapReduce jobs that run on the provider's Hadoop infrastructure. Providers are given mechanisms to filter, segment, and/or transform data before it reaches the user's task. The ecosystem also allows for intermediaries who offer value-added filtrations, segmentations, or transformations of the data (for example, pre-filtering a dataset to only include high-income users). We describe the RESTful services required to enable this ecosystem, introduce a prototype to demonstrate the concept, and present experiments using this ecosystem to both provide and analyze different segments of a single large data set.