{"title":"Articulating heterogeneous data streams with the attribute-relation file format","authors":"M. Diván, M. Reynoso","doi":"10.1063/1.5133936","DOIUrl":null,"url":null,"abstract":"The processing strategy based on measurement metadata is a data stream engine running on Apache Storm, who is able to process measures in real-time. In the data stream context, the data have no an associated limit, they are al-ways arriving. The Attribute-Relation File Format (ARFF) is used by popular software like Weka, allowing offline analysis in the machine learning and data mining area. However, the ARFF file has a finite size. The CincamimisConversor library allows exporting from the data streams organized under a measurement interchange schema to a columnar-data organization in real-time. Here, an extension to the library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format. This is very useful, because through the library now is possible to collect data from heterogeneous data sources (e.g. Internet-of-Thing -IoTdevices) and export them in real-time for offline analysis in Weka. Even, this could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. A discrete simulation was carried out, obtaining promising results. It is just required at most 0.2387 ms for translating 5000 measures, while the storing operation for them consumed less than 0.2028 ms on a Solid-State disk.","PeriodicalId":39047,"journal":{"name":"Journal of Electrical and Electronics Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical and Electronics Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1063/1.5133936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1
Abstract
The processing strategy based on measurement metadata is a data stream engine running on Apache Storm, who is able to process measures in real-time. In the data stream context, the data have no an associated limit, they are al-ways arriving. The Attribute-Relation File Format (ARFF) is used by popular software like Weka, allowing offline analysis in the machine learning and data mining area. However, the ARFF file has a finite size. The CincamimisConversor library allows exporting from the data streams organized under a measurement interchange schema to a columnar-data organization in real-time. Here, an extension to the library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format. This is very useful, because through the library now is possible to collect data from heterogeneous data sources (e.g. Internet-of-Thing -IoTdevices) and export them in real-time for offline analysis in Weka. Even, this could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. A discrete simulation was carried out, obtaining promising results. It is just required at most 0.2387 ms for translating 5000 measures, while the storing operation for them consumed less than 0.2028 ms on a Solid-State disk.
期刊介绍:
Journal of Electrical and Electronics Engineering is a scientific interdisciplinary, application-oriented publication that offer to the researchers and to the PhD students the possibility to disseminate their novel and original scientific and research contributions in the field of electrical and electronics engineering. The articles are reviewed by professionals and the selection of the papers is based only on the quality of their content and following the next criteria: the papers presents the research results of the authors, the papers / the content of the papers have not been submitted or published elsewhere, the paper must be written in English, as well as the fact that the papers should include in the reference list papers already published in recent years in the Journal of Electrical and Electronics Engineering that present similar research results. The topics and instructions for authors of this journal can be found to the appropiate sections.