Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452716
Michele Brocco, Georg Groh, C. Kern
In the last 10 years a new paradigm for creating innovations by also using external sources and paths to market has emerged and became popular. This paradigm is known as open innovation. Through the possible inclusion of these external sources for the innovation process a larger number of people (and thereby knowledge and skills) are available. People and organizations are connected in a network (so called open innovation network) of collaboration. These networks are valuable and provide an important source for composing teams, working on specific open innovation projects inside an open innovation community. We address the problem of composing such a team given the complexity of the network and innovation tasks with algorithmic team recommendation. Thereby different challenges have to be regarded such as including different aspects of team composition that were subject of research in the social and psychological sciences. We base this article on our previous work on the categorization of influencing team compostion aspects and create a team composition model based uniquely on social aspects as an example for mapping classical team composition models onto our categorization. Furthermore, we describe typical issues arising when creating team composition models from scratch when mapping them onto our proposed meta model that represents the main component of our recommender approach.
{"title":"On the influence of social factors on team recommendations","authors":"Michele Brocco, Georg Groh, C. Kern","doi":"10.1109/ICDEW.2010.5452716","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452716","url":null,"abstract":"In the last 10 years a new paradigm for creating innovations by also using external sources and paths to market has emerged and became popular. This paradigm is known as open innovation. Through the possible inclusion of these external sources for the innovation process a larger number of people (and thereby knowledge and skills) are available. People and organizations are connected in a network (so called open innovation network) of collaboration. These networks are valuable and provide an important source for composing teams, working on specific open innovation projects inside an open innovation community. We address the problem of composing such a team given the complexity of the network and innovation tasks with algorithmic team recommendation. Thereby different challenges have to be regarded such as including different aspects of team composition that were subject of research in the social and psychological sciences. We base this article on our previous work on the categorization of influencing team compostion aspects and create a team composition model based uniquely on social aspects as an example for mapping classical team composition models onto our categorization. Furthermore, we describe typical issues arising when creating team composition models from scratch when mapping them onto our proposed meta model that represents the main component of our recommender approach.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116092324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452713
M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi
In the last decades, much research has been devoted in topics related to Social Network Analysis. One important direction in this area is to analyze the temporal evolution of a network. So far, previous approaches analyzed this setting at both the global and the local level. In this paper, we focus on finding a way to detect temporal eras in an evolving network. We pose the basis for a general framework that aims at helping the analyst in browsing the temporal clusters both in a top-down and bottom-up way, exploring the network at any level of temporal details. We show the effectiveness of our approach on real data, by applying our proposed methodology to a co-authorship network extracted from a bibliographic dataset. Our first results are encouraging, and open the way for the definition and implementation of a general framework for discovering eras in evolving social networks.
{"title":"Towards discovery of eras in social networks","authors":"M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi","doi":"10.1109/ICDEW.2010.5452713","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452713","url":null,"abstract":"In the last decades, much research has been devoted in topics related to Social Network Analysis. One important direction in this area is to analyze the temporal evolution of a network. So far, previous approaches analyzed this setting at both the global and the local level. In this paper, we focus on finding a way to detect temporal eras in an evolving network. We pose the basis for a general framework that aims at helping the analyst in browsing the temporal clusters both in a top-down and bottom-up way, exploring the network at any level of temporal details. We show the effectiveness of our approach on real data, by applying our proposed methodology to a co-authorship network extracted from a bibliographic dataset. Our first results are encouraging, and open the way for the definition and implementation of a general framework for discovering eras in evolving social networks.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121524996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452732
Panagiotis Bouros, Y. Vassiliou
Nowadays, vast amount of routing data, like sequences of points of interests, landmarks, etc., are available due to the proliferation of geodata services. We refer to these sequences as routes and the involved points simply as nodes. In this thesis, we consider the problem of evaluating path queries on frequently updated route collections. We present our current work for two path queries: (i) identifying a path between two nodes of the collection, and (ii) identifying a constrained shortest path. Finally, some interesting open problems are described and our future work directions are clearly stated.
{"title":"Evaluating path queries over route collections","authors":"Panagiotis Bouros, Y. Vassiliou","doi":"10.1109/ICDEW.2010.5452732","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452732","url":null,"abstract":"Nowadays, vast amount of routing data, like sequences of points of interests, landmarks, etc., are available due to the proliferation of geodata services. We refer to these sequences as routes and the involved points simply as nodes. In this thesis, we consider the problem of evaluating path queries on frequently updated route collections. We present our current work for two path queries: (i) identifying a path between two nodes of the collection, and (ii) identifying a constrained shortest path. Finally, some interesting open problems are described and our future work directions are clearly stated.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452721
Esin Saka, O. Nasraoui
Clustering and visualizing high-dimensional sparse data simultaneously is a very attractive goal, yet it is also a challenging problem. Our previous studies using a special type of swarms, known as flocks of agents, provided some promising approaches to this challenging problem on several limited size UCI machine learning data sets and Web usage sessions (from web access logs) [1], [2]. However, dynamic domains, such as practically any data generated on the Web, may require frequent costly updates of the clusters (and the visualization), whenever new data records are added to the dataset. The new coming data may be due to new user activity on a website (clickstreams) or a search engine (queries), or new Web pages in the case of document clustering, etc. Additionally, data records may result in a change of clustering in time. Therefore, clusters may need to be updated, thus leading to the need to mine dynamic clusters. This paper summarizes our initial studies in designing a simultaneous clustering and visualization algorithm and proposes the Dynamic-FClust Algorithm, which is based on flocks of agents as a biological metaphor. This algorithm falls within the swarm-based clustering family, which is unique compared to other approaches, because its model is an ongoing swarm of agents that socially interact with each other, and is therefore inherently dynamic.
{"title":"On dynamic data clustering and visualization using swarm intelligence","authors":"Esin Saka, O. Nasraoui","doi":"10.1109/ICDEW.2010.5452721","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452721","url":null,"abstract":"Clustering and visualizing high-dimensional sparse data simultaneously is a very attractive goal, yet it is also a challenging problem. Our previous studies using a special type of swarms, known as flocks of agents, provided some promising approaches to this challenging problem on several limited size UCI machine learning data sets and Web usage sessions (from web access logs) [1], [2]. However, dynamic domains, such as practically any data generated on the Web, may require frequent costly updates of the clusters (and the visualization), whenever new data records are added to the dataset. The new coming data may be due to new user activity on a website (clickstreams) or a search engine (queries), or new Web pages in the case of document clustering, etc. Additionally, data records may result in a change of clustering in time. Therefore, clusters may need to be updated, thus leading to the need to mine dynamic clusters. This paper summarizes our initial studies in designing a simultaneous clustering and visualization algorithm and proposes the Dynamic-FClust Algorithm, which is based on flocks of agents as a biological metaphor. This algorithm falls within the swarm-based clustering family, which is unique compared to other approaches, because its model is an ongoing swarm of agents that socially interact with each other, and is therefore inherently dynamic.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125649006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452705
P. Maio, Nuno Silva
For a successful communication, autonomous entities (e.g. agents, web services, peers) must reconcile vocabulary used in their ontologies. The result is a set of mappings between ontology entities. Since each party might have its own perspective about what are the best mappings, conflicts will arise. Toward a mapping consensus building between information exchanging parties, this paper proposes an approach based on a formal argumentation framework, whose existing ontology matching algorithms generate the mappings, which are further interpreted into semantic arguments employed during the argumentation. The proposal models a mutual dependency between the mappings and arguments, which goes beyond the state of the art in argumentation-based ontology alignment negotiation, better reflecting the requirements of the task.
{"title":"Ontology alignment argumentation with mutual dependency between arguments and mappings","authors":"P. Maio, Nuno Silva","doi":"10.1109/ICDEW.2010.5452705","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452705","url":null,"abstract":"For a successful communication, autonomous entities (e.g. agents, web services, peers) must reconcile vocabulary used in their ontologies. The result is a set of mappings between ontology entities. Since each party might have its own perspective about what are the best mappings, conflicts will arise. Toward a mapping consensus building between information exchanging parties, this paper proposes an approach based on a formal argumentation framework, whose existing ontology matching algorithms generate the mappings, which are further interpreted into semantic arguments employed during the argumentation. The proposal models a mutual dependency between the mappings and arguments, which goes beyond the state of the art in argumentation-based ontology alignment negotiation, better reflecting the requirements of the task.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452753
L. Haas, Renée J. Miller, Donald Kossmann, Martin Hentschel
Two major forms of information integration, federation and materialization, continue to dominate the market, embedded in separate products, each with their strengths and weaknesses. Application developers must make difficult choices among techniques and products, choices that are hard to change later. We propose a new design principle, Integration Independence, for integration engines. Integration independence frees the application designer from deciding how to integrate data. We then describe a new, adaptive information integration engine that provides the ability to index base data or to materialize transformed data, giving us a flexible platform for experimentation.
{"title":"A first step towards integration independence","authors":"L. Haas, Renée J. Miller, Donald Kossmann, Martin Hentschel","doi":"10.1109/ICDEW.2010.5452753","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452753","url":null,"abstract":"Two major forms of information integration, federation and materialization, continue to dominate the market, embedded in separate products, each with their strengths and weaknesses. Application developers must make difficult choices among techniques and products, choices that are hard to change later. We propose a new design principle, Integration Independence, for integration engines. Integration independence frees the application designer from deciding how to integrate data. We then describe a new, adaptive information integration engine that provides the ability to index base data or to materialize transformed data, giving us a flexible platform for experimentation.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114557402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452757
Michael K. Lawrence, R. Pottinger, S. Staub-French
Existing semantic integration approaches to coordinating data do not meet the needs of real world scenarios which contain fine-grained relationships between data sources. In this paper, we describe extensions to the popular GLAV mapping formalism to express such relationships. We outline methods for solving the data coordination problem using these mappings, and discuss future research problems for data coordination to be realized in heterogeneous domain scenarios that occur in practice.
{"title":"Coordination of data in heterogenous domains","authors":"Michael K. Lawrence, R. Pottinger, S. Staub-French","doi":"10.1109/ICDEW.2010.5452757","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452757","url":null,"abstract":"Existing semantic integration approaches to coordinating data do not meet the needs of real world scenarios which contain fine-grained relationships between data sources. In this paper, we describe extensions to the popular GLAV mapping formalism to express such relationships. We outline methods for solving the data coordination problem using these mappings, and discuss future research problems for data coordination to be realized in heterogeneous domain scenarios that occur in practice.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123291740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452756
D. Thau, S. Bowers, Bertram Ludäscher
We consider the task of merging datasets that have been organized using different, but aligned taxonomies. We assume such a merge is intended to create a single dataset that unambiguously describes the information in the source datasets using the alignment. We also assume that the merged result should reflect the observations of the datasets as specifically as possible. Typically, there will be no single merge result that is both unambiguous and maximally specific. In this case, a user may be provided with a set of possible merged datasets. If the user requires a single dataset, that dataset loses specificity. Here we examine whether the data exchange setting can provide a way to derive a ¿best-effort¿ merge. We find that the data exchange setting might be a good candidate for providing the merge, but further research is needed.
{"title":"Towards best-effort merge of taxonomically organized data","authors":"D. Thau, S. Bowers, Bertram Ludäscher","doi":"10.1109/ICDEW.2010.5452756","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452756","url":null,"abstract":"We consider the task of merging datasets that have been organized using different, but aligned taxonomies. We assume such a merge is intended to create a single dataset that unambiguously describes the information in the source datasets using the alignment. We also assume that the merged result should reflect the observations of the datasets as specifically as possible. Typically, there will be no single merge result that is both unambiguous and maximally specific. In this case, a user may be provided with a set of possible merged datasets. If the user requires a single dataset, that dataset loses specificity. Here we examine whether the data exchange setting can provide a way to derive a ¿best-effort¿ merge. We find that the data exchange setting might be a good candidate for providing the merge, but further research is needed.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"14 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115717545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452742
A. Ganapathi, Yanpei Chen, A. Fox, R. Katz, D. Patterson
A recent trend for data-intensive computations is to use pay-as-you-go execution environments that scale transparently to the user. However, providers of such environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. In this paper, we use statistical models to predict resource requirements for Cloud computing applications. Such a prediction framework can guide system design and deployment decisions such as scale, scheduling, and capacity. In addition, we present initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload. This paper focuses on statistical modeling and its application to data-intensive workloads.
{"title":"Statistics-driven workload modeling for the Cloud","authors":"A. Ganapathi, Yanpei Chen, A. Fox, R. Katz, D. Patterson","doi":"10.1109/ICDEW.2010.5452742","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452742","url":null,"abstract":"A recent trend for data-intensive computations is to use pay-as-you-go execution environments that scale transparently to the user. However, providers of such environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. In this paper, we use statistical models to predict resource requirements for Cloud computing applications. Such a prediction framework can guide system design and deployment decisions such as scale, scheduling, and capacity. In addition, we present initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload. This paper focuses on statistical modeling and its application to data-intensive workloads.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125489265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452746
Jennie Duggan, Olga Papaemmanouil, U. Çetintemel
We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system's operational cost, estimated based on the IaaS provider's pricing model, while satisfying QoS expectations. Specifically, we describe two different approaches, a “white-box” approach that uses a fine-grained estimation of the expected resource consumption for a workload, and a “black-box” approach that relies on coarse-grained profiling to characterize the workload's end-to-end performance across various cloud resources. We formalize both approaches as a constraint programming problem and use a generic constraint solver to efficiently tackle them. We present preliminary experimental numbers, obtained by running TPC-H queries with PostsgreSQL on Amazon's EC2, that provide evidence of the feasibility and utility of our approaches. We also briefly discuss the pertinent challenges and directions of on-going research.
{"title":"A generic auto-provisioning framework for cloud databases","authors":"Jennie Duggan, Olga Papaemmanouil, U. Çetintemel","doi":"10.1109/ICDEW.2010.5452746","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452746","url":null,"abstract":"We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system's operational cost, estimated based on the IaaS provider's pricing model, while satisfying QoS expectations. Specifically, we describe two different approaches, a “white-box” approach that uses a fine-grained estimation of the expected resource consumption for a workload, and a “black-box” approach that relies on coarse-grained profiling to characterize the workload's end-to-end performance across various cloud resources. We formalize both approaches as a constraint programming problem and use a generic constraint solver to efficiently tackle them. We present preliminary experimental numbers, obtained by running TPC-H queries with PostsgreSQL on Amazon's EC2, that provide evidence of the feasibility and utility of our approaches. We also briefly discuss the pertinent challenges and directions of on-going research.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"777 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116413381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}