Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137298
Linhai Song, Xueqi Cheng, Yan Guo, Yue Liu, Guodong Ding
Web pages are often decorated with extraneous information (such as navigation bars, branding banners, JavaScript and advertisements). This kind of information may distract users from actual content they are really interested in and may reduce effects of many advanced web applications. Automatic content extraction has many applications ranging from providing data for web mining to realizing better accessing the web over mobile devices. In this paper, we propose ContentEx, a framework for automatic content extraction programs, which we use to organize codes of automatic content extraction programs and to facilitate the development of related solutions. We also introduce how we extract content from forum pages in this framework to fulfill the requirement from our actual application.
{"title":"ContentEx: A framework for automatic content extraction programs","authors":"Linhai Song, Xueqi Cheng, Yan Guo, Yue Liu, Guodong Ding","doi":"10.1109/ISI.2009.5137298","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137298","url":null,"abstract":"Web pages are often decorated with extraneous information (such as navigation bars, branding banners, JavaScript and advertisements). This kind of information may distract users from actual content they are really interested in and may reduce effects of many advanced web applications. Automatic content extraction has many applications ranging from providing data for web mining to realizing better accessing the web over mobile devices. In this paper, we propose ContentEx, a framework for automatic content extraction programs, which we use to organize codes of automatic content extraction programs and to facilitate the development of related solutions. We also introduce how we extract content from forum pages in this framework to fulfill the requirement from our actual application.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137262
Ryan Layfield, Murat Kantarcioglu, B. Thuraisingham
Bioterrorism represents a serious threat to the security of civilian populations. The nature of an epidemic requires careful consideration of all possible vectors over which an infection can spread. Our work takes the SIR model and creates a detailed hybridization of existing simulations to allow a large search space to be explored. We then create a Stackelberg game to evaluate all possibilities with respect to the investment of available resources and consider the resulting scenarios. Our analysis of our experimental results yields the opportunity to place an upper bound on the worst case scenario for a population center in the event of an attack, with consideration of defensive and offensive measures.
{"title":"On the mitigation of bioterrorism through game theory","authors":"Ryan Layfield, Murat Kantarcioglu, B. Thuraisingham","doi":"10.1109/ISI.2009.5137262","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137262","url":null,"abstract":"Bioterrorism represents a serious threat to the security of civilian populations. The nature of an epidemic requires careful consideration of all possible vectors over which an infection can spread. Our work takes the SIR model and creates a detailed hybridization of existing simulations to allow a large search space to be explored. We then create a Stackelberg game to evaluate all possibilities with respect to the investment of available resources and consider the resulting scenarios. Our analysis of our experimental results yields the opportunity to place an upper bound on the worst case scenario for a population center in the event of an attack, with consideration of defensive and offensive measures.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115435878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137270
A. Badia
We propose a solution to the problem of exploring large, complex data sets in a (relational) database by a human user. In a nutshell, our proposed solution is to develop the tools to support data exploration and browsing in an organized manner. We introduce: (a) an organization of data around the idea of defining distances on data sets, to reflect the intuitive notion of data that is (closely) related to other data; and (b) a new set of operators to mediate user-data interaction that exploits the defined distances to let the user “move around” the data. Our operators are simple and intuitive, yet they can be combined in a flexible manner to support complex browsing interactions. The operators are combined in a novel interface that enables intelligent data exploration and analysis.
{"title":"Supporting data exploration in databases","authors":"A. Badia","doi":"10.1109/ISI.2009.5137270","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137270","url":null,"abstract":"We propose a solution to the problem of exploring large, complex data sets in a (relational) database by a human user. In a nutshell, our proposed solution is to develop the tools to support data exploration and browsing in an organized manner. We introduce: (a) an organization of data around the idea of defining distances on data sets, to reflect the intuitive notion of data that is (closely) related to other data; and (b) a new set of operators to mediate user-data interaction that exploits the defined distances to let the user “move around” the data. Our operators are simple and intuitive, yet they can be combined in a flexible manner to support complex browsing interactions. The operators are combined in a novel interface that enables intelligent data exploration and analysis.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131581604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137322
B. Rajendran, K. Iyakutti
A goal of user assisting agents is to provide effective assistance to their users in their tasks. The problem becomes challenging when the users are involved in a cognitive activity such as a knowledge gathering task through the web, and when the agents cannot clearly know the task of their users in advance. The challenge grows when the agent themselves do not possess the knowledge required for such tasks. It is in this scenario that we propose a socio-contextual model of knowledge sharing within a community of user assisting agents, provided through the environment of an open domain knowledge portal, which are used by the users for their web based knowledge gathering tasks. The agents involve themselves in a knowledge sharing exercise by implementing the socio-contextual model that may allow each agent to gain knowledge about a task of their interest from other fellow agents through which they can assist their respective users. We evaluate our model through an experiment involving many knowledge gathering tasks from diverse domains and the results indicate interesting implications of the model with respect to the agents, their tasks and their community.
{"title":"Socio-contextual model of knowledge sharing among user assisting agents","authors":"B. Rajendran, K. Iyakutti","doi":"10.1109/ISI.2009.5137322","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137322","url":null,"abstract":"A goal of user assisting agents is to provide effective assistance to their users in their tasks. The problem becomes challenging when the users are involved in a cognitive activity such as a knowledge gathering task through the web, and when the agents cannot clearly know the task of their users in advance. The challenge grows when the agent themselves do not possess the knowledge required for such tasks. It is in this scenario that we propose a socio-contextual model of knowledge sharing within a community of user assisting agents, provided through the environment of an open domain knowledge portal, which are used by the users for their web based knowledge gathering tasks. The agents involve themselves in a knowledge sharing exercise by implementing the socio-contextual model that may allow each agent to gain knowledge about a task of their interest from other fellow agents through which they can assist their respective users. We evaluate our model through an experiment involving many knowledge gathering tasks from diverse domains and the results indicate interesting implications of the model with respect to the agents, their tasks and their community.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131666262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137299
Bo Wu, Xueqi Cheng, Yu Wang, Gang Zhang, Guodong Ding
Current approaches for generating wrappers for web page extraction suffer from the requirement of huge amount of labeled training pages to obtain satisfying results. On the other hand, the quality of data extracted by fully automatic methods is not reliable. In this paper, we propose a novel method to facilitate wrapper generation by combining wrapper induction and page analysis approaches. In addition to manually labeled data, we also take advantage of a set of unlabeled pages to improve the quality of induced wrappers. Our experiments demonstrate that our system achieves a satisfying result with fewer manually labeled training pages.
{"title":"Facilitating wrapper generation with page analysis","authors":"Bo Wu, Xueqi Cheng, Yu Wang, Gang Zhang, Guodong Ding","doi":"10.1109/ISI.2009.5137299","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137299","url":null,"abstract":"Current approaches for generating wrappers for web page extraction suffer from the requirement of huge amount of labeled training pages to obtain satisfying results. On the other hand, the quality of data extracted by fully automatic methods is not reliable. In this paper, we propose a novel method to facilitate wrapper generation by combining wrapper induction and page analysis approaches. In addition to manually labeled data, we also take advantage of a set of unlabeled pages to improve the quality of induced wrappers. Our experiments demonstrate that our system achieves a satisfying result with fewer manually labeled training pages.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128147590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137303
Allyson M. Hoss, D. Carver
Numerous challenges currently face digital forensic analysis. Although a variety of techniques and tools exist to assist with the analysis of digital evidence, they inadequately address key problems. We consider the applicability and usefulness of weaving ontologies to address some of these problems. We introduce an ontological approach leading to future development of an automated digital forensic analysis tool.
{"title":"Weaving ontologies to support digital forensic analysis","authors":"Allyson M. Hoss, D. Carver","doi":"10.1109/ISI.2009.5137303","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137303","url":null,"abstract":"Numerous challenges currently face digital forensic analysis. Although a variety of techniques and tools exist to assist with the analysis of digital evidence, they inadequately address key problems. We consider the applicability and usefulness of weaving ontologies to address some of these problems. We introduce an ontological approach leading to future development of an automated digital forensic analysis tool.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114232837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137283
Vikas Menon, W. Pottenger
Labeled Data is scarce. Most statistical machine learning techniques rely on the availability of a large labeled corpus for building robust models for prediction and classification. In this paper we present a Higher Order Collective Classifier (HOCC) based on Higher Order Learning, a statistical machine learning technique that leverages latent information present in co-occurrences of items across records. These techniques violate the IID assumption that underlies most statistical machine learning techniques and have in prior work outperformed first order techniques in the presence of very limited data. We present results of applying HOCC to two different network data sets, first for detection and classification of anomalies in a Border Gateway Protocol dataset and second for building models of users from Network File System calls to perform masquerade detection. The precision of our system has been shown to be 30% better than the standard Naive Bayes technique for masquerade detection. These results indicate that HOCC can successfully model a variety of network events and can be applied to solve difficult problems in security using the general framework proposed.
{"title":"A Higher Order Collective Classifier for detecting and classifying network events","authors":"Vikas Menon, W. Pottenger","doi":"10.1109/ISI.2009.5137283","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137283","url":null,"abstract":"Labeled Data is scarce. Most statistical machine learning techniques rely on the availability of a large labeled corpus for building robust models for prediction and classification. In this paper we present a Higher Order Collective Classifier (HOCC) based on Higher Order Learning, a statistical machine learning technique that leverages latent information present in co-occurrences of items across records. These techniques violate the IID assumption that underlies most statistical machine learning techniques and have in prior work outperformed first order techniques in the presence of very limited data. We present results of applying HOCC to two different network data sets, first for detection and classification of anomalies in a Border Gateway Protocol dataset and second for building models of users from Network File System calls to perform masquerade detection. The precision of our system has been shown to be 30% better than the standard Naive Bayes technique for masquerade detection. These results indicate that HOCC can successfully model a variety of network events and can be applied to solve difficult problems in security using the general framework proposed.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115051403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137292
Stephen Kelley, M. Goldberg, M. Magdon-Ismail, Konstantin Mertsalov, W. Wallace, Mohammed J. Zaki
In this work, we present the software library graphOnt. The purpose of this library is to automate the process of dynamically extracting “interesting” graphs from semantic networks. Instructions on the extraction are fed into the library via an ontological language specification custom built for this application. A set of SPARQL queries are used to define vertices and edges in the constructed graph. Extracted graphs are returned using the JUNG framework, which offers many algorithmic and visualization options. This work allows a set of individuals analyzing the same semantic network to extract and analyze dynamically created graphs using sophisticated, specific algorithmic tools without needing to manually construct classical graphs from the data.
{"title":"graphOnt: An ontology based library for conversion from semantic graphs to JUNG","authors":"Stephen Kelley, M. Goldberg, M. Magdon-Ismail, Konstantin Mertsalov, W. Wallace, Mohammed J. Zaki","doi":"10.1109/ISI.2009.5137292","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137292","url":null,"abstract":"In this work, we present the software library graphOnt. The purpose of this library is to automate the process of dynamically extracting “interesting” graphs from semantic networks. Instructions on the extraction are fed into the library via an ontological language specification custom built for this application. A set of SPARQL queries are used to define vertices and edges in the constructed graph. Extracted graphs are returned using the JUNG framework, which offers many algorithmic and visualization options. This work allows a set of individuals analyzing the same semantic network to extract and analyze dynamically created graphs using sophisticated, specific algorithmic tools without needing to manually construct classical graphs from the data.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"55 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123080899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137272
Yulei Zhang, Yan Dang, Hsinchun Chen
As an important type of social media, the political Web forum has become a major communication channel for people to discuss and debate political, cultural and social issues. Although the Internet has a male-dominated history, more and more women have started to share their concerns and express opinions through online discussion boards and Web forums. This paper presents an automated approach to gender difference analysis of political Web forums. The approach uses rich textual feature representation and machine learning techniques to examine the online gender differences between female and male participants on political Web forums by analyzing writing styles and topics of interest. The results of gender difference analysis performed on a large and long-standing international Islamic women's political forum are presented, showing that female and male participants have significantly different topics of interest.
{"title":"Gender difference analysis of political web forums: An experiment on an international islamic women's forum","authors":"Yulei Zhang, Yan Dang, Hsinchun Chen","doi":"10.1109/ISI.2009.5137272","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137272","url":null,"abstract":"As an important type of social media, the political Web forum has become a major communication channel for people to discuss and debate political, cultural and social issues. Although the Internet has a male-dominated history, more and more women have started to share their concerns and express opinions through online discussion boards and Web forums. This paper presents an automated approach to gender difference analysis of political Web forums. The approach uses rich textual feature representation and machine learning techniques to examine the online gender differences between female and male participants on political Web forums by analyzing writing styles and topics of interest. The results of gender difference analysis performed on a large and long-standing international Islamic women's political forum are presented, showing that female and male participants have significantly different topics of interest.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"143 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124891828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-08DOI: 10.1109/ISI.2009.5137286
Daniel T. Schmitt, S. Kurkowski, M. Mendenhall
An increase volume of surveillance video is being collected, by various organizations, which has led to a need for automated video systems in order to reduce reviewing time. Using persistent video gathered from an aircraft overhead, as is done with unmanned aerial systems in Iraq and Afghanistan, we get a birds-eye view of vehicular activity. From these activities we can use a model to detect suspicious surveillance activity (casing). This paper builds a model to detect casing events and tests it using Global Positioning System (GPS) tracks generated from vehicles driving in an urban area to show the effectiveness of the model. The results show that several vehicles can be monitored at once in real-time. Additionally, the model detects when vehicles are casing buildings and which buildings they are targeting.
{"title":"Automated casing event detection in persistent video surveillance","authors":"Daniel T. Schmitt, S. Kurkowski, M. Mendenhall","doi":"10.1109/ISI.2009.5137286","DOIUrl":"https://doi.org/10.1109/ISI.2009.5137286","url":null,"abstract":"An increase volume of surveillance video is being collected, by various organizations, which has led to a need for automated video systems in order to reduce reviewing time. Using persistent video gathered from an aircraft overhead, as is done with unmanned aerial systems in Iraq and Afghanistan, we get a birds-eye view of vehicular activity. From these activities we can use a model to detect suspicious surveillance activity (casing). This paper builds a model to detect casing events and tests it using Global Positioning System (GPS) tracks generated from vehicles driving in an urban area to show the effectiveness of the model. The results show that several vehicles can be monitored at once in real-time. Additionally, the model detects when vehicles are casing buildings and which buildings they are targeting.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"173 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}