Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829787
Edman Anjos, Junhee Lee, S. Rao
The massive amounts of data processed in modern computational systems is becoming a problem of increasing importance. This data is commonly stored directly or indirectly through the use of data exchange languages, such as JavaScript Object Notation (JSON), for human-readable platform agnostic access. This paper focuses on describing and analyzing SJSON, a library that explores succinct representations of JSON documents as a means to achieve reduced memory usage of files in main memory, and to permit the compression of JSON files stored in disk. In SJSON we represent the document structure with succinct trees, as opposed to the usual pointer-based implementation. Furthermore, the remaining raw data are organized in arrays of attributes and values. Attributes are stripped of redundancies and stored in a simple contiguous array, while values are represented through a bit string indexed array. The scheme here proposed is then evaluated with respect to a number of metrics comparing its performance with popular libraries, anssd possible improvements to the representation are then presented.
{"title":"SJSON: A succinct representation for JavaScript object notation documents","authors":"Edman Anjos, Junhee Lee, S. Rao","doi":"10.1109/ICDIM.2016.7829787","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829787","url":null,"abstract":"The massive amounts of data processed in modern computational systems is becoming a problem of increasing importance. This data is commonly stored directly or indirectly through the use of data exchange languages, such as JavaScript Object Notation (JSON), for human-readable platform agnostic access. This paper focuses on describing and analyzing SJSON, a library that explores succinct representations of JSON documents as a means to achieve reduced memory usage of files in main memory, and to permit the compression of JSON files stored in disk. In SJSON we represent the document structure with succinct trees, as opposed to the usual pointer-based implementation. Furthermore, the remaining raw data are organized in arrays of attributes and values. Attributes are stripped of redundancies and stored in a simple contiguous array, while values are represented through a bit string indexed array. The scheme here proposed is then evaluated with respect to a number of metrics comparing its performance with popular libraries, anssd possible improvements to the representation are then presented.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131252815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829795
Andrew Jones, S. Vidalis, N. Abouzakhar
The security of Cyber Physical Systems and any digital forensic investigations into them will be highly dependent on data that is stored and processed in the Cloud. This paper looks at a number of the issues that will need to be addressed if this environment is to be trusted to securely hold both system critical and personal information and to enable investigations into incidents to be undertaken.
{"title":"Information security and digital forensics in the world of cyber physical systems","authors":"Andrew Jones, S. Vidalis, N. Abouzakhar","doi":"10.1109/ICDIM.2016.7829795","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829795","url":null,"abstract":"The security of Cyber Physical Systems and any digital forensic investigations into them will be highly dependent on data that is stored and processed in the Cloud. This paper looks at a number of the issues that will need to be addressed if this environment is to be trusted to securely hold both system critical and personal information and to enable investigations into incidents to be undertaken.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133522269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829790
M. Bhanu, Joydeep Chandra
The popularity of community question answer (CQA) forums like Stack Overflow, Yahoo Answers and Quora is increasing tremendously with thousands of questions being posted each day and about thrice the number of responses being provided. With such query explosion, users participating in these forums receive a huge number of postings that adversely affects their responsiveness and also the quality of the responses. Hence, identifying topical experts is necessary to improve the efficacy of these systems in terms of both response time and quality. Although expert detection in CQA forums has traditionally been a topic of wide interest, however, many of the proposed techniques use features set that reflect the popularity of the responses of the responder rather than the difficulty level of the questions being responded. In this paper we provide measures of labeling difficult questions and use the number of difficult questions responded by a user combined with other user interaction parameters in identifying potential topical experts. Using a random forest classifier with the proposed feature set on Stack Overflow data, we obtain an improvement in accuracy of 5–16% over existing techniques, in detecting topical experts.
{"title":"Exploiting response patterns for identifying topical experts in StackOverflow","authors":"M. Bhanu, Joydeep Chandra","doi":"10.1109/ICDIM.2016.7829790","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829790","url":null,"abstract":"The popularity of community question answer (CQA) forums like Stack Overflow, Yahoo Answers and Quora is increasing tremendously with thousands of questions being posted each day and about thrice the number of responses being provided. With such query explosion, users participating in these forums receive a huge number of postings that adversely affects their responsiveness and also the quality of the responses. Hence, identifying topical experts is necessary to improve the efficacy of these systems in terms of both response time and quality. Although expert detection in CQA forums has traditionally been a topic of wide interest, however, many of the proposed techniques use features set that reflect the popularity of the responses of the responder rather than the difficulty level of the questions being responded. In this paper we provide measures of labeling difficult questions and use the number of difficult questions responded by a user combined with other user interaction parameters in identifying potential topical experts. Using a random forest classifier with the proposed feature set on Stack Overflow data, we obtain an improvement in accuracy of 5–16% over existing techniques, in detecting topical experts.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829793
S. Hubackova
The paper gives some basic view on use of multimedia when teaching foreign language. The concept of multimedia in connection with education process involves a few technical didactical instruments, which during the school supplies development appeared as a part of educational process. In classical education the most used sense in absorbing information is hearing. With the use of multimedia is the most used sense vision. Visual perception enables faster achieving not only partial learning success, but as well as reducing the overall time of learning process.
{"title":"Processing of multimedia aplications and their use in foreign language teaching","authors":"S. Hubackova","doi":"10.1109/ICDIM.2016.7829793","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829793","url":null,"abstract":"The paper gives some basic view on use of multimedia when teaching foreign language. The concept of multimedia in connection with education process involves a few technical didactical instruments, which during the school supplies development appeared as a part of educational process. In classical education the most used sense in absorbing information is hearing. With the use of multimedia is the most used sense vision. Visual perception enables faster achieving not only partial learning success, but as well as reducing the overall time of learning process.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116749020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829763
Mireille Makary, M. Oakes, Fadi Yamout
This paper represents a new technique for building a relevance judgment list for information retrieval test collections without any human intervention. It is based on the number of occurrences of the documents in runs retrieved from several information retrieval systems and a distance based measure between the documents. The effectiveness of the technique is evaluated by computing the correlation between the ranking of the TREC systems using the original relevance judgment list (qrels) built by human assessors and the ranking obtained by using the newly generated qrels.
{"title":"Towards automatic generation of relevance judgments for a test collection","authors":"Mireille Makary, M. Oakes, Fadi Yamout","doi":"10.1109/ICDIM.2016.7829763","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829763","url":null,"abstract":"This paper represents a new technique for building a relevance judgment list for information retrieval test collections without any human intervention. It is based on the number of occurrences of the documents in runs retrieved from several information retrieval systems and a distance based measure between the documents. The effectiveness of the technique is evaluated by computing the correlation between the ranking of the TREC systems using the original relevance judgment list (qrels) built by human assessors and the ranking obtained by using the newly generated qrels.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117344338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829761
Supattana Sukrat, P. Mahatanankoon, B. Papasratorn
While large retailers utilize social commerce to increase consumer trust and online sales, a variety of online transactions take place among social networking users. These new forms of e-business models often conceptualized as C2C s-commerce have become one of the most popular methods for consumer online trading. The article proposes four phases of C2C s-commerce (i.e., ad-hoc, empowered, organized, integrated) and examines how consumers can leverage the features and functionalities of social networking websites to their benefit. These four phases reveal the evolution of C2C s-commerce business models with future implications.
{"title":"The evolution of C2C social commerce models","authors":"Supattana Sukrat, P. Mahatanankoon, B. Papasratorn","doi":"10.1109/ICDIM.2016.7829761","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829761","url":null,"abstract":"While large retailers utilize social commerce to increase consumer trust and online sales, a variety of online transactions take place among social networking users. These new forms of e-business models often conceptualized as C2C s-commerce have become one of the most popular methods for consumer online trading. The article proposes four phases of C2C s-commerce (i.e., ad-hoc, empowered, organized, integrated) and examines how consumers can leverage the features and functionalities of social networking websites to their benefit. These four phases reveal the evolution of C2C s-commerce business models with future implications.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127032692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829757
Yasen Yakufu, C. Atay
With the wide availability of GPS devices in our lives, massive amounts of object movement data have been collected from various moving object targets, such as mobile devices, animals, and vehicles. In the last decade, Moving Object Databases (MOD) have attracted many researchers. Analyzing such data has deep implications in many areas, such as ecological study and traffic control. In this study, we focus on moving object data (moving points) analysis and retrieve valuable information for knowledge discovery. In this research, a moving object data model is implemented in the object-relational database system, additionally some special queries and data mining techniques are performed. Retrieving information directly from unorganized spatial-temporal data is almost impossible. However, not only a vast amount of spatial-temporal data sets organized into MOD data model but also the discovery of valuable knowledge from spatial-temporal data to help decision support processes is possible now owing to this research implementation.
{"title":"A data mining application on moving object data","authors":"Yasen Yakufu, C. Atay","doi":"10.1109/ICDIM.2016.7829757","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829757","url":null,"abstract":"With the wide availability of GPS devices in our lives, massive amounts of object movement data have been collected from various moving object targets, such as mobile devices, animals, and vehicles. In the last decade, Moving Object Databases (MOD) have attracted many researchers. Analyzing such data has deep implications in many areas, such as ecological study and traffic control. In this study, we focus on moving object data (moving points) analysis and retrieve valuable information for knowledge discovery. In this research, a moving object data model is implemented in the object-relational database system, additionally some special queries and data mining techniques are performed. Retrieving information directly from unorganized spatial-temporal data is almost impossible. However, not only a vast amount of spatial-temporal data sets organized into MOD data model but also the discovery of valuable knowledge from spatial-temporal data to help decision support processes is possible now owing to this research implementation.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123758365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829778
Antonio Corradi, L. Foschini, Alessandro Zanni, Mirco Casoni, S. Monti, Francesco Sprotetto
Data Governance and Federation in large complex organizations poses non-trivial challenges due to the integration of heterogeneous, distributed data sources. Semantic Web, and its de-facto standard query language — SPARQL — have proven to be key in defining and searching semantic over any sort of content on the Web, thus easily letting content clients discover hidden relationships among disparate data. However, current SPARQL support of Data Federation is fairly limited, making it impractical for real-world scenarios. Our work proposes an open and autonomous platform for Data Federation that overcome traditional SPARQL limitations and opens up unprecedented opportunities for Data Governance in large organizations.
{"title":"A federation model to support semantic SPARQL queries for enterprise data governance","authors":"Antonio Corradi, L. Foschini, Alessandro Zanni, Mirco Casoni, S. Monti, Francesco Sprotetto","doi":"10.1109/ICDIM.2016.7829778","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829778","url":null,"abstract":"Data Governance and Federation in large complex organizations poses non-trivial challenges due to the integration of heterogeneous, distributed data sources. Semantic Web, and its de-facto standard query language — SPARQL — have proven to be key in defining and searching semantic over any sort of content on the Web, thus easily letting content clients discover hidden relationships among disparate data. However, current SPARQL support of Data Federation is fairly limited, making it impractical for real-world scenarios. Our work proposes an open and autonomous platform for Data Federation that overcome traditional SPARQL limitations and opens up unprecedented opportunities for Data Governance in large organizations.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829770
Antonio F. G. Sevilla, Alberto Fernández-Isabel, Alberto Díaz
Grafeno is a Natural Language Processing library for doing semantics. It represents semantic information with a graph structure, and is able to automatically extract this representation from the dependency analysis of a text. It aims to encompass the different possible approaches to doing graph semantics by being as modular and flexible as possible. It also provides functionality for operating on the graph and performing different experiments. In this article, we explain its design and use, and show its potential with two use cases.
{"title":"Grafeno: Semantic graph extraction and operation","authors":"Antonio F. G. Sevilla, Alberto Fernández-Isabel, Alberto Díaz","doi":"10.1109/ICDIM.2016.7829770","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829770","url":null,"abstract":"Grafeno is a Natural Language Processing library for doing semantics. It represents semantic information with a graph structure, and is able to automatically extract this representation from the dependency analysis of a text. It aims to encompass the different possible approaches to doing graph semantics by being as modular and flexible as possible. It also provides functionality for operating on the graph and performing different experiments. In this article, we explain its design and use, and show its potential with two use cases.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127935721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/ICDIM.2016.7829756
Miguel Angel López Peña, C. Rua, Sergio Segovia Lozoya
Fast Data is a new Big Data computing paradigm that ensures requirements such as Real-Time processing of continuous data stream, storage at high rates and low latency with no data losses. In this work we propose a “Fast Data” architecture for a specific kind of software application in which input data arrive very fast and the results for each processed data have to match such input rates. We applied this architecture to build a Dashboard for Anomalous Traffic Analysis in Data Networks. In order to fulfill the requirements of Real-Time processing and no data losses, we carry out a design that consists of a pattern of dynamic tree of process pipelines, where the number of branches increases proportionally to the input data rate. Two different approaches have been followed to implement this design pattern: one based in a well-known set of products from the Big Data ecosystem; and the other built with Kafka, Zookeeper and a set of components designed and implemented by us. These two implementations have been compared in terms of velocity and scalability performance. As a result, the implementation built with our own components is significantly faster and scalable than the traditional one. The good results obtained by using both the design pattern of dynamic tree of process pipelines and our implementation make them very suitable for its use in other scenarios and applications such as smart cities, environment monitoring, industry 4.0, distributed control systems, etc.
Fast Data是一种新的大数据计算范式,可确保对连续数据流的实时处理、高速率存储和低延迟、无数据丢失等要求。在这项工作中,我们为一种特定类型的软件应用程序提出了一种“快速数据”架构,其中输入数据到达非常快,并且每个处理数据的结果必须匹配这样的输入速率。我们应用这个架构来构建一个仪表板,用于数据网络中的异常流量分析。为了满足实时处理和无数据丢失的要求,我们进行了一种由流程管道动态树模式组成的设计,其中分支数量与输入数据率成比例增加。实现这种设计模式有两种不同的方法:一种是基于大数据生态系统中一组众所周知的产品;另一个是用Kafka、Zookeeper和我们设计和实现的一组组件构建的。这两种实现在速度和可伸缩性性能方面进行了比较。因此,使用我们自己的组件构建的实现比传统的实现要快得多,而且可扩展。通过使用过程管道动态树的设计模式和我们的实现所获得的良好结果使它们非常适合在其他场景和应用中使用,例如智慧城市,环境监测,工业4.0,分布式控制系统等。
{"title":"A “Fast Data” architecture: Dashboard for anomalous traffic analysis in data networks","authors":"Miguel Angel López Peña, C. Rua, Sergio Segovia Lozoya","doi":"10.1109/ICDIM.2016.7829756","DOIUrl":"https://doi.org/10.1109/ICDIM.2016.7829756","url":null,"abstract":"Fast Data is a new Big Data computing paradigm that ensures requirements such as Real-Time processing of continuous data stream, storage at high rates and low latency with no data losses. In this work we propose a “Fast Data” architecture for a specific kind of software application in which input data arrive very fast and the results for each processed data have to match such input rates. We applied this architecture to build a Dashboard for Anomalous Traffic Analysis in Data Networks. In order to fulfill the requirements of Real-Time processing and no data losses, we carry out a design that consists of a pattern of dynamic tree of process pipelines, where the number of branches increases proportionally to the input data rate. Two different approaches have been followed to implement this design pattern: one based in a well-known set of products from the Big Data ecosystem; and the other built with Kafka, Zookeeper and a set of components designed and implemented by us. These two implementations have been compared in terms of velocity and scalability performance. As a result, the implementation built with our own components is significantly faster and scalable than the traditional one. The good results obtained by using both the design pattern of dynamic tree of process pipelines and our implementation make them very suitable for its use in other scenarios and applications such as smart cities, environment monitoring, industry 4.0, distributed control systems, etc.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}