J. Bičevskis, Zane Bicevska, Anastasija Nikiforova, Ivo Oditis
This paper discusses data quality checking during business process execution by using runtime verification. While runtime verification verifies the correctness of business process execution, data quality checks assure that particular process did not negatively impact the stored data. Both, runtime verification and data quality checks run in parallel with the base processes affecting them insignificantly. The proposed idea allows verifying (a) if the process was ended correctly as well as (b) whether the results of the correct process did not negatively impact the stored data in result of its modification caused by the specific process. The desired result will be achieved by use of domain specific languages that would describe runtime verification and data quality checks at every stage of business process execution.
{"title":"Towards Data Quality Runtime Verification","authors":"J. Bičevskis, Zane Bicevska, Anastasija Nikiforova, Ivo Oditis","doi":"10.15439/2019F168","DOIUrl":"https://doi.org/10.15439/2019F168","url":null,"abstract":"This paper discusses data quality checking during business process execution by using runtime verification. While runtime verification verifies the correctness of business process execution, data quality checks assure that particular process did not negatively impact the stored data. Both, runtime verification and data quality checks run in parallel with the base processes affecting them insignificantly. The proposed idea allows verifying (a) if the process was ended correctly as well as (b) whether the results of the correct process did not negatively impact the stored data in result of its modification caused by the specific process. The desired result will be achieved by use of domain specific languages that would describe runtime verification and data quality checks at every stage of business process execution.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131909446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleksandra Kowalska, Piotr Łuczak, Dawid Sielski, T. Kowalski, A. Romanowski, D. Sankowski
This paper presents an overview of what Big Data can bring to the modern industry. Through following the history of contemporary Big Data frameworks the authors observe that the tools available have reached sufficient maturity so as to be usable in an industrial setting. The authors propose the concept of a system for collecting, organising, processing and analysing experimental data obtained from measurements with process tomography. Process tomography is used for noninvasive flow monitoring and data acquisition. The measurement data is collected, stored and processed to identify process regimes and process threats. Further general examples of solutions that aim to take advantage of the existence of such tools are presented as proof of viability of such approach. As the first step in the process of creating the proposed system, a scalable, distributed, containerisation-based cluster has been constructed, with consumer-grade hardware.
{"title":"Towards Big Data Solutions for Industrial Tomography Data Processing","authors":"Aleksandra Kowalska, Piotr Łuczak, Dawid Sielski, T. Kowalski, A. Romanowski, D. Sankowski","doi":"10.15439/2019F310","DOIUrl":"https://doi.org/10.15439/2019F310","url":null,"abstract":"This paper presents an overview of what Big Data can bring to the modern industry. Through following the history of contemporary Big Data frameworks the authors observe that the tools available have reached sufficient maturity so as to be usable in an industrial setting. The authors propose the concept of a system for collecting, organising, processing and analysing experimental data obtained from measurements with process tomography. Process tomography is used for noninvasive flow monitoring and data acquisition. The measurement data is collected, stored and processed to identify process regimes and process threats. Further general examples of solutions that aim to take advantage of the existence of such tools are presented as proof of viability of such approach. As the first step in the process of creating the proposed system, a scalable, distributed, containerisation-based cluster has been constructed, with consumer-grade hardware.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131208323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big data processing in the Smart Grid context has many large-scale applications that require real-time data analysis (e.g., intrusion and data injection attacks detection, electric device health monitoring). In this paper, we present a big data platform for anomaly detection of power consumption data. The platform is based on an ingestion layer with data densification options, Apache Flink as part of the speed layer and HDFS/KairosDB as data storage layers. We showcase the application of the platform to a scenario of power consumption anomaly detection, benchmarking different alternative frameworks used at the speed layer level (Flink, Storm, Spark).
{"title":"Big Data Platform for Smart Grids Power Consumption Anomaly Detection","authors":"Jakub Lipcak, M. Macák, B. Rossi","doi":"10.15439/2019F210","DOIUrl":"https://doi.org/10.15439/2019F210","url":null,"abstract":"Big data processing in the Smart Grid context has many large-scale applications that require real-time data analysis (e.g., intrusion and data injection attacks detection, electric device health monitoring). In this paper, we present a big data platform for anomaly detection of power consumption data. The platform is based on an ingestion layer with data densification options, Apache Flink as part of the speed layer and HDFS/KairosDB as data storage layers. We showcase the application of the platform to a scenario of power consumption anomaly detection, benchmarking different alternative frameworks used at the speed layer level (Flink, Storm, Spark).","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134263423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Eilertsen, Dennis Højbjerg Rose, Peter Langballe Erichsen, Rasmus Engesgaard Christensen, Rudra Pratap Deb Nath
There is currently a lack of research concerning whether Emotional Classification (EC) research on a language is applicable to other languages. If this is the case then we can greatly reduce the amount of research needed for different languages. Therefore, we propose a framework to answer the following null hypothesis: The change in classification accuracy for Emotional Classification caused by changing a single preprocessor or classifier is independent of the target language within a significance level of p= 0.05. We test this hypothesis using an English and a Danish data set, and the classification algorithms: Support-Vector Machine, Naive Bayes, and Random Forest. From our statistical test, we got a p We define this area as cross-languagetalic-value of 0.12852 and could therefore not reject our hypothesis. Thus, our hypothesis could still be true. More research is therefore needed within the field of cross-language EC in order to benefit EC for different languages.
{"title":"Languages’ Impact on Emotional Classification Methods","authors":"A. Eilertsen, Dennis Højbjerg Rose, Peter Langballe Erichsen, Rasmus Engesgaard Christensen, Rudra Pratap Deb Nath","doi":"10.15439/2019F143","DOIUrl":"https://doi.org/10.15439/2019F143","url":null,"abstract":"There is currently a lack of research concerning whether Emotional Classification (EC) research on a language is applicable to other languages. If this is the case then we can greatly reduce the amount of research needed for different languages. Therefore, we propose a framework to answer the following null hypothesis: The change in classification accuracy for Emotional Classification caused by changing a single preprocessor or classifier is independent of the target language within a significance level of p= 0.05. We test this hypothesis using an English and a Danish data set, and the classification algorithms: Support-Vector Machine, Naive Bayes, and Random Forest. From our statistical test, we got a p We define this area as cross-languagetalic-value of 0.12852 and could therefore not reject our hypothesis. Thus, our hypothesis could still be true. More research is therefore needed within the field of cross-language EC in order to benefit EC for different languages.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130825864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To augment source code with high-level metadata with the intent to facilitate program comprehension, a programmer can use annotations. There are several types of annotations: either those put directly in the code or external ones. Each type comes with a unique workflow and inherent limitations. In this paper, we present a tool providing uniform annotation process, which also adds custom metadata-awareness for an industrial IDE. We also report an experiment in which we sought whether the created annotating support helps programmers to annotate code with comments faster and more consistently. The experiment showed that with the tool the annotating consistency was significantly higher but also that the increase in annotating speed was not statistically significant.
{"title":"Supporting Source Code Annotations with Metadata-Aware Development Environment","authors":"Ján Juhár","doi":"10.15439/2019F161","DOIUrl":"https://doi.org/10.15439/2019F161","url":null,"abstract":"To augment source code with high-level metadata with the intent to facilitate program comprehension, a programmer can use annotations. There are several types of annotations: either those put directly in the code or external ones. Each type comes with a unique workflow and inherent limitations. In this paper, we present a tool providing uniform annotation process, which also adds custom metadata-awareness for an industrial IDE. We also report an experiment in which we sought whether the created annotating support helps programmers to annotate code with comments faster and more consistently. The experiment showed that with the tool the annotating consistency was significantly higher but also that the increase in annotating speed was not statistically significant.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134122668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Context is widely considered for NLP and knowledge discovery since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further NLP approaches. Here, we propose a multiple step knowledge graphbased approach to utilize context data for NLP and knowledge expression and extraction. We introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept-based on biomedical literature and text mining. We discuss the impact of this novel approach on text analysis, various forms of text recognition and knowledge extraction and retrieval.
{"title":"Knowledge Extraction and Applications utilizing Context Data in Knowledge Graphs","authors":"Jens Dörpinghaus, Andreas Stefan","doi":"10.15439/2019F3","DOIUrl":"https://doi.org/10.15439/2019F3","url":null,"abstract":"Context is widely considered for NLP and knowledge discovery since it highly influences the exact meaning of natural language. The scientific challenge is not only to extract such context data, but also to store this data for further NLP approaches. Here, we propose a multiple step knowledge graphbased approach to utilize context data for NLP and knowledge expression and extraction. We introduce the graph-theoretic foundation for a general context concept within semantic networks and show a proof-of-concept-based on biomedical literature and text mining. We discuss the impact of this novel approach on text analysis, various forms of text recognition and knowledge extraction and retrieval.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134393519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advanced information technologies have enabled the development of online marketplaces that connect businesses and people on a global scale. Much of the analysis of the adoption, growth and engagement on these marketplaces in the extant literature is based on the premise that they are characterized by network effects%a premise that has major implications for their deployment, implementation and management. In this paper we test this premise using data from Kiva, the world’s largest online, peer-to-peer social lending marketplace. We find that while network effects are strong and significant during the early growth phase of the marketplace, they become weak or disappear once the marketplace stabilizes.
{"title":"Network Effects in Online Marketplaces: The Case of Kiva","authors":"H. Mendelson, Yuanyuan Shen","doi":"10.15439/2019F76","DOIUrl":"https://doi.org/10.15439/2019F76","url":null,"abstract":"Advanced information technologies have enabled the development of online marketplaces that connect businesses and people on a global scale. Much of the analysis of the adoption, growth and engagement on these marketplaces in the extant literature is based on the premise that they are characterized by network effects%a premise that has major implications for their deployment, implementation and management. In this paper we test this premise using data from Kiva, the world’s largest online, peer-to-peer social lending marketplace. We find that while network effects are strong and significant during the early growth phase of the marketplace, they become weak or disappear once the marketplace stabilizes.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132132763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corporate reputation is an economic asset and its accurate measurement is of increasing interest in practice and science. This measurement task is difficult because reputation depends on numerous factors and stakeholders. Traditional measurement approaches have focused on human ratings and surveys, which are costly, can be conducted only infrequently and emphasize financial aspects of a corporation. Nowadays, online media with comments related to products, services, and corporations provides an abundant source for measuring reputation more comprehensively. Against this backdrop, we propose an information retrieval approach to automatically collect reputation-related text content from online media and analyze this content by machine learning-based sentiment analysis. We contribute an ontology for identifying corporations and a unique dataset of online media texts labelled by corporations’ reputation. Our approach achieves an overall accuracy of 84.4%. Our results help corporations to quickly identify their reputation from online media at low cost.
{"title":"Accurate Retrieval of Corporate Reputation from Online Media Using Machine Learning","authors":"Achim Klein, Martin Riekert, Velizar Dinev","doi":"10.15439/2019F169","DOIUrl":"https://doi.org/10.15439/2019F169","url":null,"abstract":"Corporate reputation is an economic asset and its accurate measurement is of increasing interest in practice and science. This measurement task is difficult because reputation depends on numerous factors and stakeholders. Traditional measurement approaches have focused on human ratings and surveys, which are costly, can be conducted only infrequently and emphasize financial aspects of a corporation. Nowadays, online media with comments related to products, services, and corporations provides an abundant source for measuring reputation more comprehensively. Against this backdrop, we propose an information retrieval approach to automatically collect reputation-related text content from online media and analyze this content by machine learning-based sentiment analysis. We contribute an ontology for identifying corporations and a unique dataset of online media texts labelled by corporations’ reputation. Our approach achieves an overall accuracy of 84.4%. Our results help corporations to quickly identify their reputation from online media at low cost.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134492270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many industrial machine vision problems, particularly real-time control of manufacturing processes such as laser cladding, require robust and fast image processing. The inherent disturbances in images acquired during these processes makes classical segmentation algorithms uncertain. Among many convolutional neural networks introduced recently to solve such difficult problems, U-Net balances simplicity with segmentation accuracy. However, it is too computationally intensive for usage in many real-time processing pipelines.In this work we present a method of identifying the most informative levels of detail in the U-Net. By only processing the image at the selected levels, we reduce the total computation time by 80%, while still preserving adequate quality of segmentation.
{"title":"Improving Real-Time Performance of U-Nets for Machine Vision in Laser Process Control","authors":"Przemyslaw Dolata, J. Reiner","doi":"10.15439/2019F190","DOIUrl":"https://doi.org/10.15439/2019F190","url":null,"abstract":"Many industrial machine vision problems, particularly real-time control of manufacturing processes such as laser cladding, require robust and fast image processing. The inherent disturbances in images acquired during these processes makes classical segmentation algorithms uncertain. Among many convolutional neural networks introduced recently to solve such difficult problems, U-Net balances simplicity with segmentation accuracy. However, it is too computationally intensive for usage in many real-time processing pipelines.In this work we present a method of identifying the most informative levels of detail in the U-Net. By only processing the image at the selected levels, we reduce the total computation time by 80%, while still preserving adequate quality of segmentation.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sales forecasting is an essential element for implementing sustainable business strategies in the automotive industry. Accurate sales forecasts enhance the competitive edge of car manufacturers in the effort to optimize their production planning processes. We propose a forecasting technique that combines keyword-specific customer online search data with economic variables to predict monthly car sales. To isolate online search data related to pre-purchase information search, we follow a backward induction approach and identify those keywords that are frequently applied by search engine users. In a set of experiments using real-world sales data and Google Trends, we find that our keyword-specific forecasting technique reduces the out-of-sample error by 5% as compared to existing techniques without systematic keyword selection. We also find that our regression models outperform the benchmark model by an out-of-sample prediction accuracy of up to 27%.
{"title":"Predicting Automotive Sales using Pre-Purchase Online Search Data","authors":"Philipp Wachter, Tobias Widmer, Achim Klein","doi":"10.15439/2019F239","DOIUrl":"https://doi.org/10.15439/2019F239","url":null,"abstract":"Sales forecasting is an essential element for implementing sustainable business strategies in the automotive industry. Accurate sales forecasts enhance the competitive edge of car manufacturers in the effort to optimize their production planning processes. We propose a forecasting technique that combines keyword-specific customer online search data with economic variables to predict monthly car sales. To isolate online search data related to pre-purchase information search, we follow a backward induction approach and identify those keywords that are frequently applied by search engine users. In a set of experiments using real-world sales data and Google Trends, we find that our keyword-specific forecasting technique reduces the out-of-sample error by 5% as compared to existing techniques without systematic keyword selection. We also find that our regression models outperform the benchmark model by an out-of-sample prediction accuracy of up to 27%.","PeriodicalId":168208,"journal":{"name":"2019 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132999302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}