Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093328
Anne Gabrielle Bowitz, Espen Grannes Graarud, L. Brown, M. Jaatun
The Better Approach To Mobile Ad-hoc Networking (BATMAN) protocol is intended as a replacement for protocols such as OLSR, but just like most such efforts, BATMAN has no built-in security features. In this paper we describe security extensions to BATMAN that control network participation and prevent unauthorized nodes from influencing network routing.
{"title":"BatCave: Adding security to the BATMAN protocol","authors":"Anne Gabrielle Bowitz, Espen Grannes Graarud, L. Brown, M. Jaatun","doi":"10.1109/ICDIM.2011.6093328","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093328","url":null,"abstract":"The Better Approach To Mobile Ad-hoc Networking (BATMAN) protocol is intended as a replacement for protocols such as OLSR, but just like most such efforts, BATMAN has no built-in security features. In this paper we describe security extensions to BATMAN that control network participation and prevent unauthorized nodes from influencing network routing.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093320
Ales Mishchenko, N. Vassilieva
Chart images in digital documents are an important source of valuable information that is largely under-utilized for data indexing and information extraction purposes. We developed a framework to automatically extract data carried by charts and convert them to XML format. The proposed algorithm classifies image by chart type, detects graphical and textual components, extracts semantic relations between graphics and text. Classification is performed by a novel model-based method, which was extensively tested against the state-of-the-art supervised learning methods and showed high accuracy, comparable to those of the best supervised approaches. The proposed text detection algorithm is applied prior to optical character recognition and leads to significant improvement in text recognition rate (up to 20 times better). The analysis of graphical components and their relations to textual cues allows the recovering of chart data. For testing purpose, a benchmark set was created with the XML/SWF Chart tool. By comparing the recovered data and the original data used for chart generation, we are able to evaluate our information extraction framework and confirm its validity.
{"title":"Chart image understanding and numerical data extraction","authors":"Ales Mishchenko, N. Vassilieva","doi":"10.1109/ICDIM.2011.6093320","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093320","url":null,"abstract":"Chart images in digital documents are an important source of valuable information that is largely under-utilized for data indexing and information extraction purposes. We developed a framework to automatically extract data carried by charts and convert them to XML format. The proposed algorithm classifies image by chart type, detects graphical and textual components, extracts semantic relations between graphics and text. Classification is performed by a novel model-based method, which was extensively tested against the state-of-the-art supervised learning methods and showed high accuracy, comparable to those of the best supervised approaches. The proposed text detection algorithm is applied prior to optical character recognition and leads to significant improvement in text recognition rate (up to 20 times better). The analysis of graphical components and their relations to textual cues allows the recovering of chart data. For testing purpose, a benchmark set was created with the XML/SWF Chart tool. By comparing the recovered data and the original data used for chart generation, we are able to evaluate our information extraction framework and confirm its validity.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114804036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093371
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun
The large amount of Myanmar document images are getting archived by the Digital Libraries, an efficient strategy is needed to convert document image into machine understandable text format. The state of the art OCR systems can't do for Myanmar scripts as our language pose many challenges for document understanding. Therefore, this paper plans an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. Firstly, the input image is enhanced by making some correction on noise variants. Then, the characters are segmented with a novel segmentation method. The features of the isolated characters are extracted with a hybrid feature extraction method to overcome the similarity problems of the Myanmar scripts. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The experiments are carried out on a variety of Myanmar printed documents and results show the efficiency of the proposed algorithms.
{"title":"Converting Myanmar printed document image into machine understandable text format","authors":"Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun","doi":"10.1109/ICDIM.2011.6093371","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093371","url":null,"abstract":"The large amount of Myanmar document images are getting archived by the Digital Libraries, an efficient strategy is needed to convert document image into machine understandable text format. The state of the art OCR systems can't do for Myanmar scripts as our language pose many challenges for document understanding. Therefore, this paper plans an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. Firstly, the input image is enhanced by making some correction on noise variants. Then, the characters are segmented with a novel segmentation method. The features of the isolated characters are extracted with a hybrid feature extraction method to overcome the similarity problems of the Myanmar scripts. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The experiments are carried out on a variety of Myanmar printed documents and results show the efficiency of the proposed algorithms.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093362
Divesh Lala, Sutasinee Thovutikul, T. Nishida
Cultural behavior is an area of research that can allow us to further cross-cultural understanding, and is now starting to integrate itself within the field of information technology. One domain that expresses these behaviors is inside a crowd, however the analysis of micro-level crowd behavior is impractical in a real-world setting as passive observation has limitations on understanding true behavior. By using a virtual environment to simulate a crowd situation, measuring an individual's in-crowd behavior becomes feasible. This paper introduces the development of a virtual environment which enables the creation of different types of cultural crowds with which the user may interact. The parameterization of the crowds is based on the famous cultural dimensions put forward by Hofstede. One of the cultural dimensions, individualism/collectivism, was mapped to agent characteristics during a series of simulations and it was found that two distinct types of crowd could be generated. For the dimensions have not yet been examined, the proposed environment provides an ideal opportunity to address this gap in the research as well as becoming a tool with which other types of experimentation can be performed.
{"title":"Towards a virtual environment for capturing behavior in cultural crowds","authors":"Divesh Lala, Sutasinee Thovutikul, T. Nishida","doi":"10.1109/ICDIM.2011.6093362","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093362","url":null,"abstract":"Cultural behavior is an area of research that can allow us to further cross-cultural understanding, and is now starting to integrate itself within the field of information technology. One domain that expresses these behaviors is inside a crowd, however the analysis of micro-level crowd behavior is impractical in a real-world setting as passive observation has limitations on understanding true behavior. By using a virtual environment to simulate a crowd situation, measuring an individual's in-crowd behavior becomes feasible. This paper introduces the development of a virtual environment which enables the creation of different types of cultural crowds with which the user may interact. The parameterization of the crowds is based on the famous cultural dimensions put forward by Hofstede. One of the cultural dimensions, individualism/collectivism, was mapped to agent characteristics during a series of simulations and it was found that two distinct types of crowd could be generated. For the dimensions have not yet been examined, the proposed environment provides an ideal opportunity to address this gap in the research as well as becoming a tool with which other types of experimentation can be performed.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093348
R. Ibrahim, Teoh Suk Kuan
Hiding data inside an image is a practical way of hiding secret information from intruders. Image processing can then be used to get the data back from the image. In this paper, we propose a new algorithm to hide data inside an image using the steganography technique. The original data can also be retrieved from the image using the same approach. By applying the proposed algorithm, a system called Police Report Imaging System (PRIS) is developed. PRIS is developed to handle secret information for criminal cases. The system is then tested to see the viability of the proposed algorithm. The PSNR (Peak signal-to-noise ratio) is also captured for each of the images tested. Based on the PSNR value of each image, the stego image has a higher PSNR value. Hence this new steganography algorithm is very efficient to hide data inside an image to handle information for the criminal cases.
{"title":"PRIS: Image processing tool for dealing with criminal cases using steganography technique","authors":"R. Ibrahim, Teoh Suk Kuan","doi":"10.1109/ICDIM.2011.6093348","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093348","url":null,"abstract":"Hiding data inside an image is a practical way of hiding secret information from intruders. Image processing can then be used to get the data back from the image. In this paper, we propose a new algorithm to hide data inside an image using the steganography technique. The original data can also be retrieved from the image using the same approach. By applying the proposed algorithm, a system called Police Report Imaging System (PRIS) is developed. PRIS is developed to handle secret information for criminal cases. The system is then tested to see the viability of the proposed algorithm. The PSNR (Peak signal-to-noise ratio) is also captured for each of the images tested. Based on the PSNR value of each image, the stego image has a higher PSNR value. Hence this new steganography algorithm is very efficient to hide data inside an image to handle information for the criminal cases.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127879983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093337
Md. Sumon Shahriar, Jixue Liu
An improved association rule mining technique with semantic constraints is proposed in XML. The semantic constraints are expressed through the use of close properties of items in an XML document that conforms to a schema definition. The proposed association rule mining with semantic constraints can be used for mining both contents and structures in XML.
{"title":"On mining association rules with semantic constraints in XML","authors":"Md. Sumon Shahriar, Jixue Liu","doi":"10.1109/ICDIM.2011.6093337","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093337","url":null,"abstract":"An improved association rule mining technique with semantic constraints is proposed in XML. The semantic constraints are expressed through the use of close properties of items in an XML document that conforms to a schema definition. The proposed association rule mining with semantic constraints can be used for mining both contents and structures in XML.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116982323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093323
E. Pontes, A. Guelfi, S. Kofuji, Anderson A. A. Silva
Currently, defense of the cyber space is mostly based on detection and/or blocking of attacks (Intrusion Detection and Prevention System — IDPS). But, a significant improvement for IDPS is the employment of forecasting techniques in a Distributed Intrusion Forecasting System (DIFS), which enables the capability for predicting attacks. Notwithstanding, during our earlier works, one of the issues we have faced was the huge amount of alerts produced by IDPS, several of them were false positives. Checking the veracity of alerts through other sources (multi-correlation), e.g. logs taken from the operating system (OS), is a way of reducing the number of false alerts, and, therefore, improving data (historical series) to be used by the DIFS. The goal of this paper is to propose a two stage system which allows: (1) employment of an Event Analysis System (EAS) for making multi-correlation between alerts from an IDPS with the OS' logs; and (2) applying forecasting techniques on data generated by the EAS. Tests applied on laboratory by the use of the two stage system allow concluding about the improvement of the historical series reliability, and the consequent improvement of the forecasts accuracy.
{"title":"Applying multi-correlation for improving forecasting in cyber security","authors":"E. Pontes, A. Guelfi, S. Kofuji, Anderson A. A. Silva","doi":"10.1109/ICDIM.2011.6093323","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093323","url":null,"abstract":"Currently, defense of the cyber space is mostly based on detection and/or blocking of attacks (Intrusion Detection and Prevention System — IDPS). But, a significant improvement for IDPS is the employment of forecasting techniques in a Distributed Intrusion Forecasting System (DIFS), which enables the capability for predicting attacks. Notwithstanding, during our earlier works, one of the issues we have faced was the huge amount of alerts produced by IDPS, several of them were false positives. Checking the veracity of alerts through other sources (multi-correlation), e.g. logs taken from the operating system (OS), is a way of reducing the number of false alerts, and, therefore, improving data (historical series) to be used by the DIFS. The goal of this paper is to propose a two stage system which allows: (1) employment of an Event Analysis System (EAS) for making multi-correlation between alerts from an IDPS with the OS' logs; and (2) applying forecasting techniques on data generated by the EAS. Tests applied on laboratory by the use of the two stage system allow concluding about the improvement of the historical series reliability, and the consequent improvement of the forecasts accuracy.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093357
A. García, S. Bourov, A. Hammad, T. Jejkal, Jens C. Otte, S. Pfeiffer, T. Schenker, Christian Schmidt, J. V. Wezel, Bernhard Neumair, A. Streit
The Large Scale Data Facility (LSDF) was started at the Karlsruhe Institute of Technology (KIT) end of 2009 to address the growing need for value-added storage services for its data intensive experiments. The main focus of the project is to provide scientific communities producing data collections in the tera — to petabyte range with the necessary hardware infrastructure as well as with adequate value-added services and support for the data management, processing, and preservation. In this work we describe the project's infrastructure and services design, as well as its meta data handling. Both community specific meta data schemes, a meta data repository, an application programming interface and a graphical tool for accessing the resources were developed to further support the processing workflows of the partner scientific communities. The analysis workflow of high throughput microscopy images for studying biomedical processes is described in detail.
{"title":"Data management and analysis at the Large Scale Data Facility","authors":"A. García, S. Bourov, A. Hammad, T. Jejkal, Jens C. Otte, S. Pfeiffer, T. Schenker, Christian Schmidt, J. V. Wezel, Bernhard Neumair, A. Streit","doi":"10.1109/ICDIM.2011.6093357","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093357","url":null,"abstract":"The Large Scale Data Facility (LSDF) was started at the Karlsruhe Institute of Technology (KIT) end of 2009 to address the growing need for value-added storage services for its data intensive experiments. The main focus of the project is to provide scientific communities producing data collections in the tera — to petabyte range with the necessary hardware infrastructure as well as with adequate value-added services and support for the data management, processing, and preservation. In this work we describe the project's infrastructure and services design, as well as its meta data handling. Both community specific meta data schemes, a meta data repository, an application programming interface and a graphical tool for accessing the resources were developed to further support the processing workflows of the partner scientific communities. The analysis workflow of high throughput microscopy images for studying biomedical processes is described in detail.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133194923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093360
Kazunari Ishida
This paper investigates the quality of geographical information provided by Geo-media, such as Foursquare, and its geographical distribution. According to the result, geographical information described by autonomous individuals contains numerous errors and lacks data, even though it tends to have detailed street addresses. In addition, the information is heavily clustered in metropolitan areas. In order to reduce the errors, lack of data, and geographically-biased distribution of information, geo-local content systems have been developed with mobile devices for people in local communities. Members of a local community, e.g., shopping districts and tourist spots, have strong incentives to provide high quality information to their customers. Hence, the systems are provided to people in these communities so that a vast amount of geo-local contents is going to be published on the Internet.
{"title":"Geo-local contents system with mobile devices","authors":"Kazunari Ishida","doi":"10.1109/ICDIM.2011.6093360","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093360","url":null,"abstract":"This paper investigates the quality of geographical information provided by Geo-media, such as Foursquare, and its geographical distribution. According to the result, geographical information described by autonomous individuals contains numerous errors and lacks data, even though it tends to have detailed street addresses. In addition, the information is heavily clustered in metropolitan areas. In order to reduce the errors, lack of data, and geographically-biased distribution of information, geo-local content systems have been developed with mobile devices for people in local communities. Members of a local community, e.g., shopping districts and tourist spots, have strong incentives to provide high quality information to their customers. Hence, the systems are provided to people in these communities so that a vast amount of geo-local contents is going to be published on the Internet.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115836481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ICDIM.2011.6093369
M. Alsulaiman, Muhammad Ghulam, Z. Ali
Selection of the speech feature for speech recognition has been investigated for languages other than Arabic. Arabic Language has its own characteristics hence some speech features may be more suited for Arabic speech recognition than the others. In this paper, some feature extraction techniques are explored to find the features that will give the highest speech recognition rate. Our investigation in this paper showed that Mel-Frequency Cepstral Coefficients (MFCC) gave the best result. We also look at using an operator well know in image processing field to modify the way we calculate MFCC, this results in a new feature that we call LBPCC. We propose the way we use this operator. Then we conduct some experiments to test the proposed feature.
{"title":"Comparison of voice features for Arabic speech recognition","authors":"M. Alsulaiman, Muhammad Ghulam, Z. Ali","doi":"10.1109/ICDIM.2011.6093369","DOIUrl":"https://doi.org/10.1109/ICDIM.2011.6093369","url":null,"abstract":"Selection of the speech feature for speech recognition has been investigated for languages other than Arabic. Arabic Language has its own characteristics hence some speech features may be more suited for Arabic speech recognition than the others. In this paper, some feature extraction techniques are explored to find the features that will give the highest speech recognition rate. Our investigation in this paper showed that Mel-Frequency Cepstral Coefficients (MFCC) gave the best result. We also look at using an operator well know in image processing field to modify the way we calculate MFCC, this results in a new feature that we call LBPCC. We propose the way we use this operator. Then we conduct some experiments to test the proposed feature.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123621252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}