Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.00-61
M. Tits, Benjamin Bernaud, Amel Achour, Maher Badri, L. Guedria
Recently, most European distribution systems (DS) are overwhelmed by the coupled growth of decentralized production and residential appliance volatility. To cope with this issue, new solutions are emerging, such as local energy storage and energetic community management. The latter aims for the collective self-consumption maximization of the locally-produced energy through optimal planning of flexible appliances, to reduce DS maintenance costs and energy loss. The quality of short-term load forecasting is key in this process. However, it depends on various factors, foremost including the characteristics of the concerned energetic community. In this paper, we propose a methodology and a use case, based on randomized sampling for the simulation of virtual energetic communities (VEC). From the numerous simulated VEC, statistical analysis allows to assess the impact of the VEC characteristics (such as size, resident type and availability of historical data) on its predictability. From a 2-year dataset of 52 households recorded in a Belgian city, we quantify the impacts of these characteristics, and show that for this specific case study, a trade-off for efficient forecasting can be reached for a community of about 10-30 households and 2-12 months of history length.
{"title":"Impacts of Size and History Length on Energetic Community Load Forecasting: A Case Study","authors":"M. Tits, Benjamin Bernaud, Amel Achour, Maher Badri, L. Guedria","doi":"10.1109/COMPSAC48688.2020.00-61","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.00-61","url":null,"abstract":"Recently, most European distribution systems (DS) are overwhelmed by the coupled growth of decentralized production and residential appliance volatility. To cope with this issue, new solutions are emerging, such as local energy storage and energetic community management. The latter aims for the collective self-consumption maximization of the locally-produced energy through optimal planning of flexible appliances, to reduce DS maintenance costs and energy loss. The quality of short-term load forecasting is key in this process. However, it depends on various factors, foremost including the characteristics of the concerned energetic community. In this paper, we propose a methodology and a use case, based on randomized sampling for the simulation of virtual energetic communities (VEC). From the numerous simulated VEC, statistical analysis allows to assess the impact of the VEC characteristics (such as size, resident type and availability of historical data) on its predictability. From a 2-year dataset of 52 households recorded in a Belgian city, we quantify the impacts of these characteristics, and show that for this specific case study, a trade-off for efficient forecasting can be reached for a community of about 10-30 households and 2-12 months of history length.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124863852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.0-176
S. Rahman, A. Marzouqi, Swetha Variyath, Shristee Rahman, Masud Rabbani, Sheikh Iqbal Ahamed
From the statistics, almost 5 billion people in 2020 will be connected to Social Media (SM). Studies have drawn attention to the harms of SM to the health of students; it affects their attention span, memory, sleep, vision, and overall physical, mental, and social health. In this paper, we investigate the effects of SM use on the health and academic performance of students at the University of Sharjah. This study shows that students with more self-regulation have better control over social media use. A cross-sectional mixed approach (CSMA) was used to conduct the research using both quantitative and qualitative data. Out of 300 student participants in our study, the majority of them used Instagram, followed by WhatsApp and Twitter. Students reported an average time of 3-4 hours per day on social media; however, qualitative data showed that many students spent all day on social media. A majority of the students used social media to chat with friends and make new connections. They agreed that their use of social media has reduced reading of paper-based resources and has affected their grammar and writing skills. The use of SM delayed their bedtime and left fewer hours for sleep and caused eyestrain, neck/shoulder pain, fatigue, and poor posture, with declining physical activity. This study concludes that social media use does affect academic performance and health among the students of the University of Sharjah. Considering the negative consequences of extensive social media use, universities need to create awareness programs and can incorporate this as a topic in health education and awareness courses. Our study also generated new information and insights about the effects of high levels of SM usage on the health and academic performance among university students, thereby creating opportunities for further research.
{"title":"Effects of Social Media Use on Health and Academic Performance Among Students at the University of Sharjah","authors":"S. Rahman, A. Marzouqi, Swetha Variyath, Shristee Rahman, Masud Rabbani, Sheikh Iqbal Ahamed","doi":"10.1109/COMPSAC48688.2020.0-176","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.0-176","url":null,"abstract":"From the statistics, almost 5 billion people in 2020 will be connected to Social Media (SM). Studies have drawn attention to the harms of SM to the health of students; it affects their attention span, memory, sleep, vision, and overall physical, mental, and social health. In this paper, we investigate the effects of SM use on the health and academic performance of students at the University of Sharjah. This study shows that students with more self-regulation have better control over social media use. A cross-sectional mixed approach (CSMA) was used to conduct the research using both quantitative and qualitative data. Out of 300 student participants in our study, the majority of them used Instagram, followed by WhatsApp and Twitter. Students reported an average time of 3-4 hours per day on social media; however, qualitative data showed that many students spent all day on social media. A majority of the students used social media to chat with friends and make new connections. They agreed that their use of social media has reduced reading of paper-based resources and has affected their grammar and writing skills. The use of SM delayed their bedtime and left fewer hours for sleep and caused eyestrain, neck/shoulder pain, fatigue, and poor posture, with declining physical activity. This study concludes that social media use does affect academic performance and health among the students of the University of Sharjah. Considering the negative consequences of extensive social media use, universities need to create awareness programs and can incorporate this as a topic in health education and awareness courses. Our study also generated new information and insights about the effects of high levels of SM usage on the health and academic performance among university students, thereby creating opportunities for further research.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122170925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.000-9
Michelle Voong, Keerthana Gunda, S. Gokhale
With the advent of social media; politicians, media outlets, and ordinary citizens alike are routinely turning to Twitter to share their thoughts and feelings. Discerning politically biased tweets from neutral ones can assist in determining the propensity of an elected official or a media outlet in engaging in political rhetoric. This paper presents a supervised machine learning approach to predict whether a tweet is politically biased or neutral. The approach uses a labeled data set available at Crowdflower, where each tweet is tagged with a partisan/neutral label plus its message type and audience. The approach considers a combination of linguistic features including Term Frequency-Inverse Document Frequency (TF-IDF), bigrams, and trigrams along with metadata features including mentions, retweets, and URLs, as well as the additional labels of message type and audience. It trains both simple and ensemble classifiers and assesses their performance using precision, recall, and F1-score. The results demonstrate that the classifiers can predict the polarity of a tweet accurately when trained on a combination of TF-IDF and metadata features that can be extracted automatically from the tweets, eliminating the need for additional tagging which is manual, cumbersome and error prone.
{"title":"Predicting the Political Polarity of Tweets Using Supervised Machine Learning","authors":"Michelle Voong, Keerthana Gunda, S. Gokhale","doi":"10.1109/COMPSAC48688.2020.000-9","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.000-9","url":null,"abstract":"With the advent of social media; politicians, media outlets, and ordinary citizens alike are routinely turning to Twitter to share their thoughts and feelings. Discerning politically biased tweets from neutral ones can assist in determining the propensity of an elected official or a media outlet in engaging in political rhetoric. This paper presents a supervised machine learning approach to predict whether a tweet is politically biased or neutral. The approach uses a labeled data set available at Crowdflower, where each tweet is tagged with a partisan/neutral label plus its message type and audience. The approach considers a combination of linguistic features including Term Frequency-Inverse Document Frequency (TF-IDF), bigrams, and trigrams along with metadata features including mentions, retweets, and URLs, as well as the additional labels of message type and audience. It trains both simple and ensemble classifiers and assesses their performance using precision, recall, and F1-score. The results demonstrate that the classifiers can predict the polarity of a tweet accurately when trained on a combination of TF-IDF and metadata features that can be extracted automatically from the tweets, eliminating the need for additional tagging which is manual, cumbersome and error prone.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122910349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.0-155
Jiaxin Liu, Binbin Liu, Wei Dong, Yating Zhang, Daiyan Wang
Program synthesis is one of the key research areas in software engineering. Many approaches design domain-specific language to constrain the program space to make the problem tractable. Although these approaches can be effective in certain domains, it is still a challenge to synthesize programs in generic programming languages. Fortunately, the component-based synthesis provides a promising way to generate generic programs from a component library of application programming interfaces (APIs). However, the program space constituted by all the APIs in the library is still very large. Hence, only small programs can be synthesized in practice. In recent years, many approaches of API recommendation have been proposed, which can recommend relevant APIs given some specifications. We think that applying this technique to component-based synthesis is a feasible way to reduce the program space. And we believe that how much support the API recommendation methods can provide to component-based synthesis is also an important criterion in measuring the effectiveness of these methods. In this paper, we investigate 5 state-of-the-art API recommendation methods to study their effectiveness in supporting component-based synthesis. Besides, we propose an approach of API Recommendation via General Search (ARGS). We collect a set of programming tasks and compare our approach with these 5 API recommendation methods on synthesizing these tasks. The experimental results show that the capability of these API recommendation methods is limited in supporting component-based synthesis. On the contrary, ARGS can support component-based synthesis well, which can effectively narrow down the program space and eventually improve the efficiency of program synthesis. The experimental results show that ARGS can help to significantly reduce the synthesis time by 86.1% compared to the original SyPet.
{"title":"How Much Support Can API Recommendation Methods Provide for Component-Based Synthesis?","authors":"Jiaxin Liu, Binbin Liu, Wei Dong, Yating Zhang, Daiyan Wang","doi":"10.1109/COMPSAC48688.2020.0-155","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.0-155","url":null,"abstract":"Program synthesis is one of the key research areas in software engineering. Many approaches design domain-specific language to constrain the program space to make the problem tractable. Although these approaches can be effective in certain domains, it is still a challenge to synthesize programs in generic programming languages. Fortunately, the component-based synthesis provides a promising way to generate generic programs from a component library of application programming interfaces (APIs). However, the program space constituted by all the APIs in the library is still very large. Hence, only small programs can be synthesized in practice. In recent years, many approaches of API recommendation have been proposed, which can recommend relevant APIs given some specifications. We think that applying this technique to component-based synthesis is a feasible way to reduce the program space. And we believe that how much support the API recommendation methods can provide to component-based synthesis is also an important criterion in measuring the effectiveness of these methods. In this paper, we investigate 5 state-of-the-art API recommendation methods to study their effectiveness in supporting component-based synthesis. Besides, we propose an approach of API Recommendation via General Search (ARGS). We collect a set of programming tasks and compare our approach with these 5 API recommendation methods on synthesizing these tasks. The experimental results show that the capability of these API recommendation methods is limited in supporting component-based synthesis. On the contrary, ARGS can support component-based synthesis well, which can effectively narrow down the program space and eventually improve the efficiency of program synthesis. The experimental results show that ARGS can help to significantly reduce the synthesis time by 86.1% compared to the original SyPet.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125750915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.000-2
Zheng Li, Yanzhao Xi, Ruilian Zhao
By scheduling algorithms in the low-level algorithm library, the hyper-heuristic algorithm can help to effectively select an appropriate method to deal with hard computational search problems. The hyper-heuristic algorithm usually includes a high-level scheduling layer and a low-level algorithm layer. The high-level strategy layer selects the algorithm for the next scheduling by evaluating the execution effect of the different algorithms in the low-level layer, while the low-level layer includes a variety of different heuristic algorithms which called algorithm library. The concrete hyper-heuristic framework for multi-objective test case prioritization was presented where the 18 multi-objective algorithms were formed in the low-level library. It has been gradually realized that a hybrid algorithm by combining single objective algorithm and multi-objective optimization algorithm is better than the individual. This paper explores the influence of the construction pattern of algorithm library for the hyper-heuristic algorithm by constructing the fusion pattern of different types of algorithms.
{"title":"A Hybrid Algorithms Construction of Hyper-Heuristic for Test Case Prioritization","authors":"Zheng Li, Yanzhao Xi, Ruilian Zhao","doi":"10.1109/COMPSAC48688.2020.000-2","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.000-2","url":null,"abstract":"By scheduling algorithms in the low-level algorithm library, the hyper-heuristic algorithm can help to effectively select an appropriate method to deal with hard computational search problems. The hyper-heuristic algorithm usually includes a high-level scheduling layer and a low-level algorithm layer. The high-level strategy layer selects the algorithm for the next scheduling by evaluating the execution effect of the different algorithms in the low-level layer, while the low-level layer includes a variety of different heuristic algorithms which called algorithm library. The concrete hyper-heuristic framework for multi-objective test case prioritization was presented where the 18 multi-objective algorithms were formed in the low-level library. It has been gradually realized that a hybrid algorithm by combining single objective algorithm and multi-objective optimization algorithm is better than the individual. This paper explores the influence of the construction pattern of algorithm library for the hyper-heuristic algorithm by constructing the fusion pattern of different types of algorithms.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129542328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.0-188
Yuki Wakisaka, Kazuyuki Yamashita, Sho Tsugawa, H. Ohsaki
Influence maximization in a social network has been intensively studied, motivated by its application to so-called viral marketing. The influence maximization problem is formulated as a combinatorial optimization problem on a graph that aims to identify a small set of influential nodes (i.e., seed nodes) such that the expected size of the influence cascade triggered by the seed nodes is maximized. In general, it is difficult in practice to obtain the complete knowledge on large-scale networks. Therefore, a problem of identifying a set of influential seed nodes only from a partial structure of the network obtained from network sampling strategies has also been studied in recent years. To achieve efficient influence propagation in unknown networks, the number of sample nodes must be determined appropriately for obtaining a partial structure of the network. In this paper, we clarify the relation between the sample size and the expected size of influence cascade triggered by the seed nodes through mathematical analyses. Specifically, we derive the expected size of influence cascade with random node sampling and degree-based seed node selection. Through several numerical examples using datasets of real social networks, we also investigate the implication of our analysis results to influence maximization on unknown social networks.
{"title":"On the Effectiveness of Random Node Sampling in Influence Maximization on Unknown Graph","authors":"Yuki Wakisaka, Kazuyuki Yamashita, Sho Tsugawa, H. Ohsaki","doi":"10.1109/COMPSAC48688.2020.0-188","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.0-188","url":null,"abstract":"Influence maximization in a social network has been intensively studied, motivated by its application to so-called viral marketing. The influence maximization problem is formulated as a combinatorial optimization problem on a graph that aims to identify a small set of influential nodes (i.e., seed nodes) such that the expected size of the influence cascade triggered by the seed nodes is maximized. In general, it is difficult in practice to obtain the complete knowledge on large-scale networks. Therefore, a problem of identifying a set of influential seed nodes only from a partial structure of the network obtained from network sampling strategies has also been studied in recent years. To achieve efficient influence propagation in unknown networks, the number of sample nodes must be determined appropriately for obtaining a partial structure of the network. In this paper, we clarify the relation between the sample size and the expected size of influence cascade triggered by the seed nodes through mathematical analyses. Specifically, we derive the expected size of influence cascade with random node sampling and degree-based seed node selection. Through several numerical examples using datasets of real social networks, we also investigate the implication of our analysis results to influence maximization on unknown social networks.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130962780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.0-159
H. Matsushita, R. Uda
In this paper, we proposed a method for detecting hacked accounts in SNS without predetermined features since trend of topics and slang expressions always change and hackers can make messages which are matched with the predetermined features. There are some researches in which a hacked account or impersonation in SNS is detected. However, they have problems that predetermined features were used in their method or evaluation procedure was not appropriate. On the other hand, in our method, a feature named 'category' is automatically extracted among recent tweets by machine learning. We evaluated the categories with 1,000 test accounts. As a result, 74.4% of the test accounts can be detected with the rate up to 96.0% when they are hacked and only one new message is posted. Moreover, 73.4% of the test accounts can be detected with the rate up to 99.2% by one new posted message. Furthermore, other hacked accounts can also be detected with the same rate when several messages are sequentially posted.
{"title":"Detection of Change of Users in SNS by Two Dimensional CNN","authors":"H. Matsushita, R. Uda","doi":"10.1109/COMPSAC48688.2020.0-159","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.0-159","url":null,"abstract":"In this paper, we proposed a method for detecting hacked accounts in SNS without predetermined features since trend of topics and slang expressions always change and hackers can make messages which are matched with the predetermined features. There are some researches in which a hacked account or impersonation in SNS is detected. However, they have problems that predetermined features were used in their method or evaluation procedure was not appropriate. On the other hand, in our method, a feature named 'category' is automatically extracted among recent tweets by machine learning. We evaluated the categories with 1,000 test accounts. As a result, 74.4% of the test accounts can be detected with the rate up to 96.0% when they are hacked and only one new message is posted. Moreover, 73.4% of the test accounts can be detected with the rate up to 99.2% by one new posted message. Furthermore, other hacked accounts can also be detected with the same rate when several messages are sequentially posted.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.00-72
Shuvalaxmi Dass, Xiaozhen Xue, A. Namin
The performance of coverage-based fault localization greatly depends on the quality of test cases being executed. These test cases execute some lines of the given program and determine whether the underlying tests are passed or failed. In particular, some test cases may be well-behaved (i.e., passed) while executing faulty statements. These test cases, also known as coincidentally correct test cases, may negatively influence the performance of the spectra-based fault localization and thus be less helpful as a tool for the purpose of automated debugging. In other words, the involvement of these coincidentally correct test cases may introduce noises to the fault localization computation and thus cause in divergence of effectively localizing the location of possible bugs in the given code. In this paper, we propose a hybrid approach of ensemble learning combined with a supervised learning algorithm namely, Random Forests (RF) for the purpose of correctly identifying test cases that are mislabeled to be the passing test cases. A cost-effective analysis of flipping the test status or trimming (i.e., eliminating from the computation) the coincidental correct test cases is also reported.
{"title":"Ensemble Random Forests Classifier for Detecting Coincidentally Correct Test Cases","authors":"Shuvalaxmi Dass, Xiaozhen Xue, A. Namin","doi":"10.1109/COMPSAC48688.2020.00-72","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.00-72","url":null,"abstract":"The performance of coverage-based fault localization greatly depends on the quality of test cases being executed. These test cases execute some lines of the given program and determine whether the underlying tests are passed or failed. In particular, some test cases may be well-behaved (i.e., passed) while executing faulty statements. These test cases, also known as coincidentally correct test cases, may negatively influence the performance of the spectra-based fault localization and thus be less helpful as a tool for the purpose of automated debugging. In other words, the involvement of these coincidentally correct test cases may introduce noises to the fault localization computation and thus cause in divergence of effectively localizing the location of possible bugs in the given code. In this paper, we propose a hybrid approach of ensemble learning combined with a supervised learning algorithm namely, Random Forests (RF) for the purpose of correctly identifying test cases that are mislabeled to be the passing test cases. A cost-effective analysis of flipping the test status or trimming (i.e., eliminating from the computation) the coincidental correct test cases is also reported.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122931270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/COMPSAC48688.2020.00-97
Katsuya Matsubara, Yuhei Takagawa
Web services have seen early adoption and rapid growth with the introduction of various web system frameworks, and not only large companies but also smaller businesses or individuals can provide their own web services offering access to their products, entertainment, and information. Therefore, as the number of Internet users increases, especially with the spread of smartphones, even relatively small web service infrastructure need to have both high access performance and availability, although the cost of additional computational resources and redundant servers may be hard to bear depending on the load. We focus on the fact that the performance characteristics may differ depending on the internal implementation of the operating system (OS), even when the available computing resources are the same. This paper investigates the possibility of developing a system that achieves both high access performance and availability of the web server by dynamically switching the OS on which the web server is running, without requiring additional computing resources or using redundant servers. This paper identifies the differences between Linux and FreeBSD in terms of network processing and describes the mechanism of process migration among heterogeneous OSes to switch the OSes. It then demonstrates the feasibility of our approach with experimental results on the performance characteristics and load tolerance of a web server in operation when the OSes are dynamically switched.
{"title":"Adaptive OS Switching for Improving Availability During Web Traffic Surges: A Feasibility Study","authors":"Katsuya Matsubara, Yuhei Takagawa","doi":"10.1109/COMPSAC48688.2020.00-97","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.00-97","url":null,"abstract":"Web services have seen early adoption and rapid growth with the introduction of various web system frameworks, and not only large companies but also smaller businesses or individuals can provide their own web services offering access to their products, entertainment, and information. Therefore, as the number of Internet users increases, especially with the spread of smartphones, even relatively small web service infrastructure need to have both high access performance and availability, although the cost of additional computational resources and redundant servers may be hard to bear depending on the load. We focus on the fact that the performance characteristics may differ depending on the internal implementation of the operating system (OS), even when the available computing resources are the same. This paper investigates the possibility of developing a system that achieves both high access performance and availability of the web server by dynamically switching the OS on which the web server is running, without requiring additional computing resources or using redundant servers. This paper identifies the differences between Linux and FreeBSD in terms of network processing and describes the mechanism of process migration among heterogeneous OSes to switch the OSes. It then demonstrates the feasibility of our approach with experimental results on the performance characteristics and load tolerance of a web server in operation when the OSes are dynamically switched.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed applications in web services have become increasingly complex in response to various user demands. Consequently, system administrators have difficulty understanding inter-process dependencies in distributed applications. When parts of the system are changed or augmented, they cannot identify the area of influence by the change, which might engender a more damaging outage than expected. Therefore, they must trace dependencies automatically among unknown processes. An earlier method discovered the dependency by detecting the transport connection using the Linux packet filter on the hosts at ends of the network connection. However, the extra delay to the application traffic increases because of the additional processing inherent in the packet processing in the Linux kernel. As described herein, we propose an architecture of monitoring network sockets, which are endpoints of TCP connections, to trace the dependency. As long as applications use the TCP protocol stack in the Linux kernel, the dependencies are discovered by our architecture. Therefore, monitoring processing only reads the connection information from network sockets. The processing is independent of the application communication. Therefore, the monitoring does not affect the network delay of the applications. Our experiments confirmed that our architecture reduced the delay overhead by 13–20 % and the resource load by 43.5 % compared to earlier reported methods.
{"title":"Transtracer: Socket-Based Tracing of Network Dependencies Among Processes in Distributed Applications","authors":"Yuuki Tsubouchi, Masahiro Furukawa, Ryosuke Matsumoto","doi":"10.1109/COMPSAC48688.2020.00-92","DOIUrl":"https://doi.org/10.1109/COMPSAC48688.2020.00-92","url":null,"abstract":"Distributed applications in web services have become increasingly complex in response to various user demands. Consequently, system administrators have difficulty understanding inter-process dependencies in distributed applications. When parts of the system are changed or augmented, they cannot identify the area of influence by the change, which might engender a more damaging outage than expected. Therefore, they must trace dependencies automatically among unknown processes. An earlier method discovered the dependency by detecting the transport connection using the Linux packet filter on the hosts at ends of the network connection. However, the extra delay to the application traffic increases because of the additional processing inherent in the packet processing in the Linux kernel. As described herein, we propose an architecture of monitoring network sockets, which are endpoints of TCP connections, to trace the dependency. As long as applications use the TCP protocol stack in the Linux kernel, the dependencies are discovered by our architecture. Therefore, monitoring processing only reads the connection information from network sockets. The processing is independent of the application communication. Therefore, the monitoring does not affect the network delay of the applications. Our experiments confirmed that our architecture reduced the delay overhead by 13–20 % and the resource load by 43.5 % compared to earlier reported methods.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125268840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}