Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404457
M. Hall, S. Kimbrough, Christian Haas, Christof Weinhardt, Simon Caton
There is an overriding interest in measuring the well-being of communities and institutions: healthy (flourishing) individuals and groups perform “better” than those that are not. Capturing the facets of well-being is, however, not straightforward: it contains personal information with sometimes uncomfortable self-realizations associated to it. Yet, the benefit of such data is the ability to observe and react to imbalances of a community, i.e. it can facilitate community management. Due to its personal nature, the observation of well-being needs to leverage carefully considered constructs. To have a comprehensive look at the concept of individual well-being, we propose a gamified frame of reference within a social network platform to lower traditional entrance barriers for data collection and encourage continued usage. In our setting, participants can record aspects of their well-being as a part of their “normal” social network activities, as well as view trends of themselves and their community. To evaluate the feasibility of our approach, we present the results of an initial study conducted via Facebook.
{"title":"Towards the gamification of well-being measures","authors":"M. Hall, S. Kimbrough, Christian Haas, Christof Weinhardt, Simon Caton","doi":"10.1109/eScience.2012.6404457","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404457","url":null,"abstract":"There is an overriding interest in measuring the well-being of communities and institutions: healthy (flourishing) individuals and groups perform “better” than those that are not. Capturing the facets of well-being is, however, not straightforward: it contains personal information with sometimes uncomfortable self-realizations associated to it. Yet, the benefit of such data is the ability to observe and react to imbalances of a community, i.e. it can facilitate community management. Due to its personal nature, the observation of well-being needs to leverage carefully considered constructs. To have a comprehensive look at the concept of individual well-being, we propose a gamified frame of reference within a social network platform to lower traditional entrance barriers for data collection and encourage continued usage. In our setting, participants can record aspects of their well-being as a part of their “normal” social network activities, as well as view trends of themselves and their community. To evaluate the feasibility of our approach, we present the results of an initial study conducted via Facebook.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"36 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86022975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404482
Jun Zhao, José Manuél Gómez-Pérez, Khalid Belhajjame, G. Klyne, Esteban García-Cuesta, Aleix Garrido, K. Hettne, M. Roos, D. D. Roure, C. Goble
Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.
{"title":"Why workflows break — Understanding and combating decay in Taverna workflows","authors":"Jun Zhao, José Manuél Gómez-Pérez, Khalid Belhajjame, G. Klyne, Esteban García-Cuesta, Aleix Garrido, K. Hettne, M. Roos, D. D. Roure, C. Goble","doi":"10.1109/ESCIENCE.2012.6404482","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404482","url":null,"abstract":"Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"27 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83534679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404432
Elif Dede, Zacharia Fadika, Jessica Hartog, M. Govindaraju, L. Ramakrishnan, D. Gunter, S. Canon
MapReduce has since its inception been steadily gaining ground in various scientific disciplines ranging from space exploration to protein folding. The model poses a challenge for a wide range of current and legacy scientific applications for addressing their “Big Data” challenges. For example: MapRe-duce's best known implementation, Apache Hadoop, only offers native support for Java applications. While Hadoop streaming supports applications compiled in a variety of languages such as C, C++, Python and FORTRAN, streaming has shown to be a less efficient MapReduce alternative in terms of performance, and effectiveness. Additionally, Hadoop streaming offers lesser options than its native counterpart, and as such offers less flexibility along with a limited array of features for scientific software. The Hadoop File System (HDFS), a central pillar of Apache Hadoop is not a POSIX compliant file system. In this paper, we present an alternative framework to Hadoop streaming to address the needs of scientific applications: MARISSA (MApReduce Implementation for Streaming Science Applications). We describe MARISSA's design and explain how it expands the scientific applications that can benefit from the MapReduce model. We also compare and explain the performance gains of MARISSA over Hadoop streaming.
{"title":"MARISSA: MApReduce Implementation for Streaming Science Applications","authors":"Elif Dede, Zacharia Fadika, Jessica Hartog, M. Govindaraju, L. Ramakrishnan, D. Gunter, S. Canon","doi":"10.1109/eScience.2012.6404432","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404432","url":null,"abstract":"MapReduce has since its inception been steadily gaining ground in various scientific disciplines ranging from space exploration to protein folding. The model poses a challenge for a wide range of current and legacy scientific applications for addressing their “Big Data” challenges. For example: MapRe-duce's best known implementation, Apache Hadoop, only offers native support for Java applications. While Hadoop streaming supports applications compiled in a variety of languages such as C, C++, Python and FORTRAN, streaming has shown to be a less efficient MapReduce alternative in terms of performance, and effectiveness. Additionally, Hadoop streaming offers lesser options than its native counterpart, and as such offers less flexibility along with a limited array of features for scientific software. The Hadoop File System (HDFS), a central pillar of Apache Hadoop is not a POSIX compliant file system. In this paper, we present an alternative framework to Hadoop streaming to address the needs of scientific applications: MARISSA (MApReduce Implementation for Streaming Science Applications). We describe MARISSA's design and explain how it expands the scientific applications that can benefit from the MapReduce model. We also compare and explain the performance gains of MARISSA over Hadoop streaming.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81772601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404439
Kalev H. Leetaru
This paper examines the needs of emerging applications of High Performance Computing by the Humanities, Arts, and Social Sciences (HASS) disciplines and presents a vision for how the current academic HPC environment could be adapted to better serve this new class of “big data” research.
{"title":"Towards HPC for the digital Humanities, Arts, and Social Sciences: Needs and challenges of adapting academic HPC for big data","authors":"Kalev H. Leetaru","doi":"10.1109/eScience.2012.6404439","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404439","url":null,"abstract":"This paper examines the needs of emerging applications of High Performance Computing by the Humanities, Arts, and Social Sciences (HASS) disciplines and presents a vision for how the current academic HPC environment could be adapted to better serve this new class of “big data” research.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"107 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75880461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404435
D. Karastoyanova, Dimitrios Dentsas, D. Schumm, M. Sonntag, Lina Sun, Karolina Vukojevic-Haupt
The use of information technology in research and practice leads to increased degree of automation of tasks and makes scientific experiments more efficient in terms of cost, speed, accuracy, and flexibility. Scientific workflows have proven useful for the automation of scientific computations. However, not all tasks of an experiment can be automated. Some decisions still need to be made by human users, for instance, how an automated system should proceed in an exceptional situation. To address the need for integration of human users in such automated systems, we propose the concept of Human Communication Flows, which specify best practices about how a scientific workflow can interact with a human user. We developed a human communication framework that implements Communication Flows in a pipes-and-filters architecture and supports both notifications and request-response interactions. Different Communication Services can be plugged into the framework to account for different communication capabilities of human users. We facilitate the use of Communication Flows within a scientific workflow by means of reusable workflow fragments implementing the interaction with the framework.
{"title":"Service-based integration of human users in workflow-driven scientific experiments","authors":"D. Karastoyanova, Dimitrios Dentsas, D. Schumm, M. Sonntag, Lina Sun, Karolina Vukojevic-Haupt","doi":"10.1109/eScience.2012.6404435","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404435","url":null,"abstract":"The use of information technology in research and practice leads to increased degree of automation of tasks and makes scientific experiments more efficient in terms of cost, speed, accuracy, and flexibility. Scientific workflows have proven useful for the automation of scientific computations. However, not all tasks of an experiment can be automated. Some decisions still need to be made by human users, for instance, how an automated system should proceed in an exceptional situation. To address the need for integration of human users in such automated systems, we propose the concept of Human Communication Flows, which specify best practices about how a scientific workflow can interact with a human user. We developed a human communication framework that implements Communication Flows in a pipes-and-filters architecture and supports both notifications and request-response interactions. Different Communication Services can be plugged into the framework to account for different communication capabilities of human users. We facilitate the use of Communication Flows within a scientific workflow by means of reusable workflow fragments implementing the interaction with the framework.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"9 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74530120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404440
Eric Shook, Kalev H. Leetaru, G. Cao, Anand Padmanabhan, Shaowen Wang
The field of Culturomics exploits “big data” to explore human society at population scale. Culturomics increasingly needs to consider geographic contexts and, thus, this research develops a geospatial visual analytical approach that transforms vast amounts of textual data into emotional heatmaps with fine-grained spatial resolution. Fulltext geocoding and sentiment mining extract locations and latent “tone” from text-based data, which are combined with spatial analysis methods - kernel density estimation and spatial interpolation - to generate heatmaps that capture the interplay of location, topic, and tone toward narrative impacts. To demonstrate the effectiveness of the approach, the complete English edition of Wikipedia is processed using a supercomputer to extract all locations and tone associated with the year of 2003. An emotional heatmap of Wikipedia's discussion of “armed conflict” for that year is created using the spatial analysis methods. Unlike previous research, our approach is designed for exploratory spatial analysis of topics in text archives by incorporating multiple attributes including the prominence of each location mentioned in the text, the density of a topic at each location compared to other topics, and the tone of the topics of interest into a single analysis. The generation of such fine-grained emotional heatmaps is computationally intensive particularly when accounting for the multiple attributes at fine scales. Therefore a CyberGIS platform based on national cyberinfrastructure in the United States is used to enable the computationally intensive visual analytics.
{"title":"Happy or not: Generating topic-based emotional heatmaps for Culturomics using CyberGIS","authors":"Eric Shook, Kalev H. Leetaru, G. Cao, Anand Padmanabhan, Shaowen Wang","doi":"10.1109/ESCIENCE.2012.6404440","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404440","url":null,"abstract":"The field of Culturomics exploits “big data” to explore human society at population scale. Culturomics increasingly needs to consider geographic contexts and, thus, this research develops a geospatial visual analytical approach that transforms vast amounts of textual data into emotional heatmaps with fine-grained spatial resolution. Fulltext geocoding and sentiment mining extract locations and latent “tone” from text-based data, which are combined with spatial analysis methods - kernel density estimation and spatial interpolation - to generate heatmaps that capture the interplay of location, topic, and tone toward narrative impacts. To demonstrate the effectiveness of the approach, the complete English edition of Wikipedia is processed using a supercomputer to extract all locations and tone associated with the year of 2003. An emotional heatmap of Wikipedia's discussion of “armed conflict” for that year is created using the spatial analysis methods. Unlike previous research, our approach is designed for exploratory spatial analysis of topics in text archives by incorporating multiple attributes including the prominence of each location mentioned in the text, the density of a topic at each location compared to other topics, and the tone of the topics of interest into a single analysis. The generation of such fine-grained emotional heatmaps is computationally intensive particularly when accounting for the multiple attributes at fine scales. Therefore a CyberGIS platform based on national cyberinfrastructure in the United States is used to enable the computationally intensive visual analytics.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"5 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81568576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404462
B. Tierney, E. Kissel, D. M. Swany, Eric Pouyoul
Data set sizes are growing exponentially, so it is important to use data movement protocols that are the most efficient available. Most data movement tools today rely on TCP over sockets, which limits flows to around 20Gbps on today's hardware. RDMA over Converged Ethernet (RoCE) is a promising new technology for high-performance network data movement with minimal CPU impact over circuit-based infrastructures. We compare the performance of TCP, UDP, UDT, and RoCE over high latency 10Gbps and 40Gbps network paths, and show that RoCE-based data transfers can fill a 40Gbps path using much less CPU than other protocols. We also show that the Linux zero-copy system calls can improve TCP performance considerably, especially on current Intel “Sandy Bridge”-based PCI Express 3.0 (Gen3) hosts.
{"title":"Efficient data transfer protocols for big data","authors":"B. Tierney, E. Kissel, D. M. Swany, Eric Pouyoul","doi":"10.1109/eScience.2012.6404462","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404462","url":null,"abstract":"Data set sizes are growing exponentially, so it is important to use data movement protocols that are the most efficient available. Most data movement tools today rely on TCP over sockets, which limits flows to around 20Gbps on today's hardware. RDMA over Converged Ethernet (RoCE) is a promising new technology for high-performance network data movement with minimal CPU impact over circuit-based infrastructures. We compare the performance of TCP, UDP, UDT, and RoCE over high latency 10Gbps and 40Gbps network paths, and show that RoCE-based data transfers can fill a 40Gbps path using much less CPU than other protocols. We also show that the Linux zero-copy system calls can improve TCP performance considerably, especially on current Intel “Sandy Bridge”-based PCI Express 3.0 (Gen3) hosts.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"139 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76083692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404460
Jimmy Cullen, R. Hughes-Jones, R. Spencer
We describe our experiences in creating multigigabit links using the GÉ ANT Bandwidth on Demand (BoD) Client Portal and report measurement and analysis of the performance of connections using both FPGA and PC based network testing tools. This research was performed as part of a work package for the EC funded NEXPReS project.
{"title":"Verification and user experience of high data rate bandwidth-on-demand networks","authors":"Jimmy Cullen, R. Hughes-Jones, R. Spencer","doi":"10.1109/eScience.2012.6404460","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404460","url":null,"abstract":"We describe our experiences in creating multigigabit links using the GÉ ANT Bandwidth on Demand (BoD) Client Portal and report measurement and analysis of the performance of connections using both FPGA and PC based network testing tools. This research was performed as part of a work package for the EC funded NEXPReS project.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"20 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76502982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404428
H. Nguyen, D. Abramson
Workflow-based science gateways that bring the power of scientific workflows to the Web are becoming increasingly popular. Different IO models enabling interactions between a running workflow and web portal have been explored. However, these are typically not dynamic enough to allow users to insert data into, or export data out of, a continuously running workflow. In this paper, we present a novel IO model, which supports dynamic interaction between a workflow and its portal. We discuss a use case in which web portal are used to control the execution of scientific workflows. This IO model will be part of our workflow-based science gateway named WorkWays.
{"title":"WorkWays: Interactive workflow-based science gateways","authors":"H. Nguyen, D. Abramson","doi":"10.1109/eScience.2012.6404428","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404428","url":null,"abstract":"Workflow-based science gateways that bring the power of scientific workflows to the Web are becoming increasingly popular. Different IO models enabling interactions between a running workflow and web portal have been explored. However, these are typically not dynamic enough to allow users to insert data into, or export data out of, a continuously running workflow. In this paper, we present a novel IO model, which supports dynamic interaction between a workflow and its portal. We discuss a use case in which web portal are used to control the execution of scientific workflows. This IO model will be part of our workflow-based science gateway named WorkWays.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"53 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88957999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404421
Irfan Azeezullah, Friska Pambudi, Tung-Kai Shyy, Imran Azeezullah, Nigel Ward, J. Hunter, R. Stimson
The field of Spatially Integrated Social Science (SISS) recognizes that much data of interest to social scientists has an associated geographic location. SISS systems use geographic location as the basis for integrating heterogeneous social science data sets and for visualizing and analyzing the integrated results through mapping interfaces. However, sourcing data sets, aggregating data captured at different spatial scales, and implementing statistical analysis techniques over the data are highly complex and challenging steps, beyond the capabilities of many social scientists. The aim of the UQ SISS eResearch Facility (SISS-eRF) is to remove this burden from social scientists by providing a Web interface that allows researchers to quickly access relevant Australian socio-spatial datasets (e.g. census data, voting data), aggregate them spatially, conduct statistical modeling on the datasets and visualize spatial distribution patterns and statistical results. This paper describes the technical architecture and components of SISS-eRF and discusses the reasons that underpin the technological choices. It describes some case studies that demonstrate how SISS-eRF is being applied to prove hypotheses that relate particular voting patterns with socio-economic parameters (e.g., gender, age, housing, income, education, employment, religion/culture). Finally we outline our future plans for extending and deploying SISS-eRF across the Australian Social Science Community.
{"title":"Statistical analysis and visualization services for Spatially Integrated Social Science datasets","authors":"Irfan Azeezullah, Friska Pambudi, Tung-Kai Shyy, Imran Azeezullah, Nigel Ward, J. Hunter, R. Stimson","doi":"10.1109/eScience.2012.6404421","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404421","url":null,"abstract":"The field of Spatially Integrated Social Science (SISS) recognizes that much data of interest to social scientists has an associated geographic location. SISS systems use geographic location as the basis for integrating heterogeneous social science data sets and for visualizing and analyzing the integrated results through mapping interfaces. However, sourcing data sets, aggregating data captured at different spatial scales, and implementing statistical analysis techniques over the data are highly complex and challenging steps, beyond the capabilities of many social scientists. The aim of the UQ SISS eResearch Facility (SISS-eRF) is to remove this burden from social scientists by providing a Web interface that allows researchers to quickly access relevant Australian socio-spatial datasets (e.g. census data, voting data), aggregate them spatially, conduct statistical modeling on the datasets and visualize spatial distribution patterns and statistical results. This paper describes the technical architecture and components of SISS-eRF and discusses the reasons that underpin the technological choices. It describes some case studies that demonstrate how SISS-eRF is being applied to prove hypotheses that relate particular voting patterns with socio-economic parameters (e.g., gender, age, housing, income, education, employment, religion/culture). Finally we outline our future plans for extending and deploying SISS-eRF across the Australian Social Science Community.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"10 3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83773454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}