A. Humphrey, C. Derrick, G. Gopalakrishnan, Beth Tibbitts
Formal dynamic verification can complement MPI program testing by detecting hard-to-find concurrency bugs. In previous work, we described our dynamic verifier called In-situ Partial Order (ISP) that can parsimoniously search the execution space of an MPI program while detecting important classes of bugs. One major limitation of ISP, when used by itself, is the lack of a powerful and widely usable graphical front-end. We now present a new tool called Graphical Explorer of MPI Programs (GEM) that overcomes this limitation. GEM is a plug-in architecture that greatly enhances the usability of ISP, and serves to bring ISP within reach of a wide array of programmers with its original release as part of the Eclipse Foundation’s Parallel Tools Platform (PTP) Version 3.0 in December, 2009. GEM is now a part of the PTP End-User Runtime. This paper describes GEM’s features, its architecture, and usage experience summary of the ISP/GEM combination. Recently, we applied this combination on a widely used parallel hypergraph partitioner. Even with modest amounts of computational resources, the ISP/GEM combination finished quickly and intuitively displayed a previously unknown resource leak in this code-base. Here, we also describe the process and benefits of using GEM throughout the development cycle of our own test case, an MPI implementation of the A* search. We conclude with a summary of our future plans.
{"title":"GEM: Graphical Explorer of MPI Programs","authors":"A. Humphrey, C. Derrick, G. Gopalakrishnan, Beth Tibbitts","doi":"10.1145/1879211.1879248","DOIUrl":"https://doi.org/10.1145/1879211.1879248","url":null,"abstract":"Formal dynamic verification can complement MPI program testing by detecting hard-to-find concurrency bugs. In previous work, we described our dynamic verifier called In-situ Partial Order (ISP) that can parsimoniously search the execution space of an MPI program while detecting important classes of bugs. One major limitation of ISP, when used by itself, is the lack of a powerful and widely usable graphical front-end. We now present a new tool called Graphical Explorer of MPI Programs (GEM) that overcomes this limitation. GEM is a plug-in architecture that greatly enhances the usability of ISP, and serves to bring ISP within reach of a wide array of programmers with its original release as part of the Eclipse Foundation’s Parallel Tools Platform (PTP) Version 3.0 in December, 2009. GEM is now a part of the PTP End-User Runtime. This paper describes GEM’s features, its architecture, and usage experience summary of the ISP/GEM combination. Recently, we applied this combination on a widely used parallel hypergraph partitioner. Even with modest amounts of computational resources, the ISP/GEM combination finished quickly and intuitively displayed a previously unknown resource leak in this code-base. Here, we also describe the process and benefits of using GEM throughout the development cycle of our own test case, an MPI implementation of the A* search. We conclude with a summary of our future plans.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128379895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Zheng, M. Dong, K. Ota, Huakang Li, Song Guo, M. Guo
Saving energy usually leads to performance degradation. We explore the limits of tag reduction on a multi-core processor with guaranteed performance effect. In our previous work, tag reduction is applied to multi-core processors and shows significant energy savings, meanwhile it causes performance overhead. In this paper, we have found out that when tag reduction is used on multi-core processors, the number of cores is the key factor that affects both energy and performance. More specifically, when tag reduction is applied to multi-core processors, as the number of core integrated into the chip increases, tag reduction can save more energy, while causes more performance degradation. Tag reduction has the limits that are represented by the number of cores. In order to derive the limits, we study the relationship between energy consumption and performance overhead and propose a decision model. We build up an experiment platform that is composed of Linux Physical Memory Monitor (LPMM), Trace Recorder (TR), Scalable Multi-core Simulator (SMS) and Data Analysis Module (DAM). We evaluate benchmarks from SPEC CPU2006 on a real operating system with help of LPMM and TR; and then get the raw results about energy and performance using SMS; finally, DAM analyzes the raw results and finds out the limits. Experimental results show that tag reduction should be applied to the multi-core processor which integrates no more than 6 cores; otherwise, the energy- and performance-efficiency of tag reduction degrades.
{"title":"Exploring the Limits of Tag Reduction for Energy Saving on a Multi-core Processor","authors":"Long Zheng, M. Dong, K. Ota, Huakang Li, Song Guo, M. Guo","doi":"10.1109/ICPPW.2010.26","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.26","url":null,"abstract":"Saving energy usually leads to performance degradation. We explore the limits of tag reduction on a multi-core processor with guaranteed performance effect. In our previous work, tag reduction is applied to multi-core processors and shows significant energy savings, meanwhile it causes performance overhead. In this paper, we have found out that when tag reduction is used on multi-core processors, the number of cores is the key factor that affects both energy and performance. More specifically, when tag reduction is applied to multi-core processors, as the number of core integrated into the chip increases, tag reduction can save more energy, while causes more performance degradation. Tag reduction has the limits that are represented by the number of cores. In order to derive the limits, we study the relationship between energy consumption and performance overhead and propose a decision model. We build up an experiment platform that is composed of Linux Physical Memory Monitor (LPMM), Trace Recorder (TR), Scalable Multi-core Simulator (SMS) and Data Analysis Module (DAM). We evaluate benchmarks from SPEC CPU2006 on a real operating system with help of LPMM and TR; and then get the raw results about energy and performance using SMS; finally, DAM analyzes the raw results and finds out the limits. Experimental results show that tag reduction should be applied to the multi-core processor which integrates no more than 6 cores; otherwise, the energy- and performance-efficiency of tag reduction degrades.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123235568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a triple matrix product with one matrix transposition. Based on a systematic approach to represent and schedule different forms of the $ntimes n$ matrix-matrix multiply-add (MMA) operation in 3D index space, we design new orbital highly-parallel/scalable algorithms and present an efficient $ntimes n$ unified array processor for computing {it any} $ntimes n$ forward/inverse discrete separable transform in the minimal $2n$ time-steps. Unlike traditional 2D systolic array processing, all $n^2$ register-stored elements of initial/intermediate matrices are processed simultaneously by all $n^2$ processing elements of the unified array processor at each time-step. Hence the proposed array processor is appropriate for applications with naturally arranged multidimensional data such as still images, video frames, 2D data from a matrix sensor, etc. Ultimately, we introduce a novel formulation and a highly-parallel implementation of the frequently required matrix data alignment and manipulation by using MMA operations on the same array processor so that no additional circuitry is needed.
{"title":"Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms","authors":"S. Sedukhin, A. Zekri, T. Miyazaki","doi":"10.1109/ICPPW.2010.29","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.29","url":null,"abstract":"The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a triple matrix product with one matrix transposition. Based on a systematic approach to represent and schedule different forms of the $ntimes n$ matrix-matrix multiply-add (MMA) operation in 3D index space, we design new orbital highly-parallel/scalable algorithms and present an efficient $ntimes n$ unified array processor for computing {it any} $ntimes n$ forward/inverse discrete separable transform in the minimal $2n$ time-steps. Unlike traditional 2D systolic array processing, all $n^2$ register-stored elements of initial/intermediate matrices are processed simultaneously by all $n^2$ processing elements of the unified array processor at each time-step. Hence the proposed array processor is appropriate for applications with naturally arranged multidimensional data such as still images, video frames, 2D data from a matrix sensor, etc. Ultimately, we introduce a novel formulation and a highly-parallel implementation of the frequently required matrix data alignment and manipulation by using MMA operations on the same array processor so that no additional circuitry is needed.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122148212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we show the design and implementation of a novel localization scheme, called Rotatable Antenna Localization (RAL), for a wireless sensor network (WSN) with beacon nodes with directional antennas which rotate regularly. A beacon node periodically sends beacon signals containing its position and antenna orientations. By observing the variation of the received signal strength indication (RSSI) values of the beacon signals, a sensor node can estimate the orientation relative to the beacon node. With the estimated orientations and exact positions of two distinct beacon nodes, a sensor can calculate its own location. Four methods are proposed and implemented for the sensor node to estimate its orientations. Among them, we find that the strongest-signal (SS) method has the most accurate orientation estimation. With SS method, we implement RAL scheme and apply it to a WSN in a 10- by 10-meter indoor environment with two beacon nodes at two ends of a side. Our experiment demonstrates that the average position estimation error of RAL is 76 centimeters. We further propose two methods, namely grid- and vector-based approximation methods, to improve RAL by installing more than two beacon nodes. We show by simulation that the improvements can reduce about 10% of the position error.
{"title":"Localization with Rotatable Directional Antennas for Wireless Sensor Networks","authors":"Jehn-Ruey Jiang, Chih-Ming Lin, Yi-Jia Hsu","doi":"10.1109/ICPPW.2010.79","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.79","url":null,"abstract":"In this paper we show the design and implementation of a novel localization scheme, called Rotatable Antenna Localization (RAL), for a wireless sensor network (WSN) with beacon nodes with directional antennas which rotate regularly. A beacon node periodically sends beacon signals containing its position and antenna orientations. By observing the variation of the received signal strength indication (RSSI) values of the beacon signals, a sensor node can estimate the orientation relative to the beacon node. With the estimated orientations and exact positions of two distinct beacon nodes, a sensor can calculate its own location. Four methods are proposed and implemented for the sensor node to estimate its orientations. Among them, we find that the strongest-signal (SS) method has the most accurate orientation estimation. With SS method, we implement RAL scheme and apply it to a WSN in a 10- by 10-meter indoor environment with two beacon nodes at two ends of a side. Our experiment demonstrates that the average position estimation error of RAL is 76 centimeters. We further propose two methods, namely grid- and vector-based approximation methods, to improve RAL by installing more than two beacon nodes. We show by simulation that the improvements can reduce about 10% of the position error.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129850627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 tera???ops while ScaLAPACK’s pdgemm measured close to 42 tera???ops on 10000 processes.
{"title":"Scaling Linear Algebra Kernels Using Remote Memory Access","authors":"M. Krishnan, R. Lewis, Abhinav Vishnu","doi":"10.1109/ICPPW.2010.57","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.57","url":null,"abstract":"This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 tera???ops while ScaLAPACK’s pdgemm measured close to 42 tera???ops on 10000 processes.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous Wireless Sensor Networks, in which the deployed sensors have different capacities, are gradually used to perform critical surveillance in real world. For conserving energy, powerful sensors are usually activated only when an event is detected, while low-cost and error-prone sensors dominate the quality of surveillance (QoSu) when an interesting event just appears. To guarantee a desired QoSu, deploying the K-coverage configuration in HWSNs has attracted much attention. However, little work addresses a significant issue of measuring the fault tolerance level on a K-coverage configuration in HWSNs. In this paper, we first propose an energy-efficient eligibility approach to perform the K-covered HWSNs with very low cost. The QoSu is further formalized in terms of explicit metrics, such as probabilities of system false positives and system false negatives. An appropriate deployment of the K-coverage configuration can thus be determined according to a desired QoSu while prolonging the system lifetime.
{"title":"Quality of Surveillance Measures in K-Covered Heterogeneous Wireless Sensor Networks","authors":"M. Wueng, I. Hwang","doi":"10.1109/ICPPW.2010.82","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.82","url":null,"abstract":"Heterogeneous Wireless Sensor Networks, in which the deployed sensors have different capacities, are gradually used to perform critical surveillance in real world. For conserving energy, powerful sensors are usually activated only when an event is detected, while low-cost and error-prone sensors dominate the quality of surveillance (QoSu) when an interesting event just appears. To guarantee a desired QoSu, deploying the K-coverage configuration in HWSNs has attracted much attention. However, little work addresses a significant issue of measuring the fault tolerance level on a K-coverage configuration in HWSNs. In this paper, we first propose an energy-efficient eligibility approach to perform the K-covered HWSNs with very low cost. The QoSu is further formalized in terms of explicit metrics, such as probabilities of system false positives and system false negatives. An appropriate deployment of the K-coverage configuration can thus be determined according to a desired QoSu while prolonging the system lifetime.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121451056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm. We use some of the most popular CPU and GPU profiling tools for their advantages and overcome their disadvantages with several new techniques to analyze the performance and relationship of different components of applications. With the presented methodologies, insight information is produced which helps to understand and tune the proposed algorithm and considerably improve the performance of the solver itself as well as the application using it.
{"title":"Mixed-Tool Performance Analysis on Hybrid Multicore Architectures","authors":"Peng Du, P. Luszczek, S. Tomov, J. Dongarra","doi":"10.1109/ICPPW.2010.41","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.41","url":null,"abstract":"This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm. We use some of the most popular CPU and GPU profiling tools for their advantages and overcome their disadvantages with several new techniques to analyze the performance and relationship of different components of applications. With the presented methodologies, insight information is produced which helps to understand and tune the proposed algorithm and considerably improve the performance of the solver itself as well as the application using it.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125995677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Multi-core Multi-threading Microprocessor introduces not only resource sharing to threads in the same core, e.g., computation resources and private caches, but also isolates those resources within different cores. Moreover, when the Simultaneous Multithreading architecture is employed, the execution resources are fully shared among the concurrently executing threads in the same core, while the isolation is worsened as the number of cores increases. Even though fetch policies regarding how to assign priorities in fetch stage are well designed to manage the shared resources in a core, it is actually the scheduling policy that makes the distributed resources available for workloads, through deciding how to send their threads to cores. On the other hand, threads consume various resources in different phases and Cycles Per Instruction Spent on Memory (CPImem) is used to express their resource demands. Consequently, aiming at better performance via scheduling according to their resource demands, we propose the Mix-Scheduling to evenly mix threads across cores, so that it achieves thread diversity, i.e., CPImem diversity in every core. As a result, it is observed in our experiment that 63% improvement in overall system throughput and 27% improvement in average thread performance, when comparing the Mix-Scheduling policy with the reference policy Mono-Scheduling, which keeps CPImem uniformity among threads in every core on chips. Furthermore, the Mix-Scheduling also makes an essential step towards shortening load latency, because it succeeds in reducing the L2 Cache Miss Rate by 6% from Mono-Scheduling.
{"title":"On Better Performance from Scheduling Threads According to Resource Demands in MMMP","authors":"L. Weng, Chen Liu","doi":"10.1109/ICPPW.2010.53","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.53","url":null,"abstract":"The Multi-core Multi-threading Microprocessor introduces not only resource sharing to threads in the same core, e.g., computation resources and private caches, but also isolates those resources within different cores. Moreover, when the Simultaneous Multithreading architecture is employed, the execution resources are fully shared among the concurrently executing threads in the same core, while the isolation is worsened as the number of cores increases. Even though fetch policies regarding how to assign priorities in fetch stage are well designed to manage the shared resources in a core, it is actually the scheduling policy that makes the distributed resources available for workloads, through deciding how to send their threads to cores. On the other hand, threads consume various resources in different phases and Cycles Per Instruction Spent on Memory (CPImem) is used to express their resource demands. Consequently, aiming at better performance via scheduling according to their resource demands, we propose the Mix-Scheduling to evenly mix threads across cores, so that it achieves thread diversity, i.e., CPImem diversity in every core. As a result, it is observed in our experiment that 63% improvement in overall system throughput and 27% improvement in average thread performance, when comparing the Mix-Scheduling policy with the reference policy Mono-Scheduling, which keeps CPImem uniformity among threads in every core on chips. Furthermore, the Mix-Scheduling also makes an essential step towards shortening load latency, because it succeeds in reducing the L2 Cache Miss Rate by 6% from Mono-Scheduling.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115741639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xun Li, Pablo J. Ortiz, Jeffrey Browne, Diana Franklin, J. Oliver, R. Geyer, Yuanyuan Zhou, F. Chong
The dark side of Moore's Law is our society's insatiable need to constantly upgrade our computing devices. The high cost in manufacturing energy, materials and disposal is more worrisome the increasing number of smartphones. Repurposing smartphones for educational purpose is a promising idea and shown success in recent years. Our previous work has shown that although different components in smartphones degrade from use, their functionalities, available resources and power supplies are still able to satisfy the requirement of educational applications. In this study, we demonstrate the potential benefits of reusing smartphones by analyzing their manufacturing and life-time energy. The key challenge is the design of software that can adapt to extreme heterogeneity of devices. We also characterize different types of heterogeneities among different generations of smartphones from HTC and Apple, including processing capability, storage resource and various features. We propose insights to aid establishing a sustainable model of designing mobile applications for phone reuse.
{"title":"Smartphone Evolution and Reuse: Establishing a More Sustainable Model","authors":"Xun Li, Pablo J. Ortiz, Jeffrey Browne, Diana Franklin, J. Oliver, R. Geyer, Yuanyuan Zhou, F. Chong","doi":"10.1109/ICPPW.2010.70","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.70","url":null,"abstract":"The dark side of Moore's Law is our society's insatiable need to constantly upgrade our computing devices. The high cost in manufacturing energy, materials and disposal is more worrisome the increasing number of smartphones. Repurposing smartphones for educational purpose is a promising idea and shown success in recent years. Our previous work has shown that although different components in smartphones degrade from use, their functionalities, available resources and power supplies are still able to satisfy the requirement of educational applications. In this study, we demonstrate the potential benefits of reusing smartphones by analyzing their manufacturing and life-time energy. The key challenge is the design of software that can adapt to extreme heterogeneity of devices. We also characterize different types of heterogeneities among different generations of smartphones from HTC and Apple, including processing capability, storage resource and various features. We propose insights to aid establishing a sustainable model of designing mobile applications for phone reuse.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115850226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommendation systems have found their ways into many on-line web applications, e.g., product recommendation on Amazon and movie recommendation on Netflix. Particularly, collaborative filtering techniques have been widely used in these systems to personalize the recommendations according to the needs and tastes of users. In this paper, we apply collaborative filtering in spatial object recommendation which is essential in many location based services. Due to the large number of spatial objects and participating users, using collaborative filtering to obtain recommendations for a particular user can be very expensive. However, we observe that users tend to have affinity for some regions and argue that using users with similar regional bias in recommendation may help in reducing the search space of similar users. Thus, we propose two techniques, namely, Access Minimum Bounding Rectangle Overlapped Area (AMBROA) and Grid Division Cosine Similarity (GDCS), to form regions of interests that represent user location interests and activities and to find users with local access similarity to facilitate effective spatial object recommendation. We conduct an extensive performance evaluation to validate our ideas. Evaluation result demonstrates the superiority of our proposal over the conventional approach.
{"title":"Collaborative Spatial Object Recommendation in Location Based Services","authors":"G. Gupta, Wang-Chien Lee","doi":"10.1109/ICPPW.2010.16","DOIUrl":"https://doi.org/10.1109/ICPPW.2010.16","url":null,"abstract":"Recommendation systems have found their ways into many on-line web applications, e.g., product recommendation on Amazon and movie recommendation on Netflix. Particularly, collaborative filtering techniques have been widely used in these systems to personalize the recommendations according to the needs and tastes of users. In this paper, we apply collaborative filtering in spatial object recommendation which is essential in many location based services. Due to the large number of spatial objects and participating users, using collaborative filtering to obtain recommendations for a particular user can be very expensive. However, we observe that users tend to have affinity for some regions and argue that using users with similar regional bias in recommendation may help in reducing the search space of similar users. Thus, we propose two techniques, namely, Access Minimum Bounding Rectangle Overlapped Area (AMBROA) and Grid Division Cosine Similarity (GDCS), to form regions of interests that represent user location interests and activities and to find users with local access similarity to facilitate effective spatial object recommendation. We conduct an extensive performance evaluation to validate our ideas. Evaluation result demonstrates the superiority of our proposal over the conventional approach.","PeriodicalId":415472,"journal":{"name":"2010 39th International Conference on Parallel Processing Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132104998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}