Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583077
S. Al-Fedaghi
This paper investigates the problem of how to model the need for information, and the process of seeking information that satisfies that need. It utilizes a proposed information flow model that integrates information needs, information seeking, and information-based activities. The model includes the phases of need generation and propagation that are transformed into information seeking, which in turn triggers information flow to enable fulfillment of needs.
{"title":"Integration of information needs and seeking","authors":"S. Al-Fedaghi","doi":"10.1109/IRI.2008.4583077","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583077","url":null,"abstract":"This paper investigates the problem of how to model the need for information, and the process of seeking information that satisfies that need. It utilizes a proposed information flow model that integrates information needs, information seeking, and information-based activities. The model includes the phases of need generation and propagation that are transformed into information seeking, which in turn triggers information flow to enable fulfillment of needs.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122741807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583023
Claudio Bertoli, Valter Crescenzi, P. Merialdo
Many large web sites provide pages containing highly valuable data. In order to extract data from these pages several methods and techniques have been developed to generate web wrappers, that is, programs that convert into a structured format the data embedded into HTML pages. These techniques easy the burden of writing applications that make reuse of data from the web. However the generation of wrappers is just one of the ingredients needed to the development of such applications. A necessary yet underestimated task is that of developing programs for driving a crawler towards the pages that contain the target data. We present a method and an associated tool to support this activity. Our method relies on a data model whose constructs allows a designer to define an intensional description of the organization of data in a web site. Based on the model, we introduce the concepts of (i) intensional navigation, which represents an abstract description of the navigation to be performed to reach pages of interest, and (ii) extensional navigation, which represents the actual set of navigation paths (i.e. sequences of links to be followed) that lead the target pages. The method is supported by a tool that infers an intensional navigation, i.e. the crawling program, from one sample extensional navigation. The tool, which has been developed as a Firefox plug-in, supports the designer in the task of defining and verifying the sample navigation and the inferred crawling program.
{"title":"Crawling programs for wrapper-based applications","authors":"Claudio Bertoli, Valter Crescenzi, P. Merialdo","doi":"10.1109/IRI.2008.4583023","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583023","url":null,"abstract":"Many large web sites provide pages containing highly valuable data. In order to extract data from these pages several methods and techniques have been developed to generate web wrappers, that is, programs that convert into a structured format the data embedded into HTML pages. These techniques easy the burden of writing applications that make reuse of data from the web. However the generation of wrappers is just one of the ingredients needed to the development of such applications. A necessary yet underestimated task is that of developing programs for driving a crawler towards the pages that contain the target data. We present a method and an associated tool to support this activity. Our method relies on a data model whose constructs allows a designer to define an intensional description of the organization of data in a web site. Based on the model, we introduce the concepts of (i) intensional navigation, which represents an abstract description of the navigation to be performed to reach pages of interest, and (ii) extensional navigation, which represents the actual set of navigation paths (i.e. sequences of links to be followed) that lead the target pages. The method is supported by a tool that infers an intensional navigation, i.e. the crawling program, from one sample extensional navigation. The tool, which has been developed as a Firefox plug-in, supports the designer in the task of defining and verifying the sample navigation and the inferred crawling program.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"427 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115657312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583063
M. Kunz, S. Mencke, D. Rud, R. Dumke
The importance of providing integration architectures in every field of application is beyond controversy these days. Unfortunately existing solutions are mainly focusing on functionality. But for the success of Systems Integration in the long run, the quality of developed architectures is of substantial interest. Existing quality-related information can be reused to optimize this assembly of components to thereby always provide the best possible combination. For this purpose a framework for the quality-driven creation of architectures is proposed in this paper. Besides this quality-oriented characteristic, the usage of semantic knowledge and structured process descriptions enable an automatic procedure. Especially the combination of both is a promising approach.
{"title":"Empirical-based design — quality-driven assembly of components","authors":"M. Kunz, S. Mencke, D. Rud, R. Dumke","doi":"10.1109/IRI.2008.4583063","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583063","url":null,"abstract":"The importance of providing integration architectures in every field of application is beyond controversy these days. Unfortunately existing solutions are mainly focusing on functionality. But for the success of Systems Integration in the long run, the quality of developed architectures is of substantial interest. Existing quality-related information can be reused to optimize this assembly of components to thereby always provide the best possible combination. For this purpose a framework for the quality-driven creation of architectures is proposed in this paper. Besides this quality-oriented characteristic, the usage of semantic knowledge and structured process descriptions enable an automatic procedure. Especially the combination of both is a promising approach.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124455862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583014
G. S. Novak
Reuse of information requires an ability to understand data gathered from the web and to integrate that data with knowledge and reusable programs. We describe systems that allow a user to capture and understand data from the web and rapidly and easily write programs to analyze the data and combine it with other data. A data grokker parses data, inferring the data types of its fields both from field names and from values of the data itself; this produces both a local set of usable data and a set of data type descriptions that link the data to known types. The known types have knowledge and reusable procedures that can be inherited and used with the data. Web pages that perform calculations or data lookup can be treated as remote procedure calls, allowing calculations, proprietary data and real-time data to be used. We have developed a graphical programming system that can specialize reusable programs for use with data from the web, allowing rapid and easy construction of programs for custom analysis of web data. These systems are illustrated with examples.
{"title":"Coupling data understanding with software reuse","authors":"G. S. Novak","doi":"10.1109/IRI.2008.4583014","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583014","url":null,"abstract":"Reuse of information requires an ability to understand data gathered from the web and to integrate that data with knowledge and reusable programs. We describe systems that allow a user to capture and understand data from the web and rapidly and easily write programs to analyze the data and combine it with other data. A data grokker parses data, inferring the data types of its fields both from field names and from values of the data itself; this produces both a local set of usable data and a set of data type descriptions that link the data to known types. The known types have knowledge and reusable procedures that can be inherited and used with the data. Web pages that perform calculations or data lookup can be treated as remote procedure calls, allowing calculations, proprietary data and real-time data to be used. We have developed a graphical programming system that can specialize reusable programs for use with data from the web, allowing rapid and easy construction of programs for custom analysis of web data. These systems are illustrated with examples.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124810219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583053
É. Grégoire
This paper is about the fusion of multiple knowledge sources represented using default logic. More precisely, the focus is on solving the problem that occurs when the standard-logic knowledge parts of the sources are contradictory, as default theories trivialize in this case. To overcome this problem, several candidate policies are discussed. Among them, it is shown that replacing each formula belonging to minimally unsatisfiable subformulas by a corresponding supernormal default exhibits appealing features.
{"title":"Using default logic to enhance default logic: preliminary report","authors":"É. Grégoire","doi":"10.1109/IRI.2008.4583053","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583053","url":null,"abstract":"This paper is about the fusion of multiple knowledge sources represented using default logic. More precisely, the focus is on solving the problem that occurs when the standard-logic knowledge parts of the sources are contradictory, as default theories trivialize in this case. To overcome this problem, several candidate policies are discussed. Among them, it is shown that replacing each formula belonging to minimally unsatisfiable subformulas by a corresponding supernormal default exhibits appealing features.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"86 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125717367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583016
Ryan T. K. Lin, Hong-Jie Dai, Yue-Yang Bow, Min-Yuh Day, Richard Tzong-Han Tsai, W. Hsu
For biomedical research, the most important parts of an abstract are the result and conclusion sections. Some journals divide an abstract into several sections so that readers can easily identify those parts, but others do not. We propose a method that can automatically identify the result and conclusion sections of any biomedical abstracts by formulating this identification problem as a sequence labeling task. Three feature sets (Position, Named Entity, and Word Frequency) are employed with Conditional Random Fields (CRFs) as the underlying machine learning model. Experimental results show that the combination of our proposed feature sets can achieve F-measure, precision, and recall scores of 92.50%, 95.32% and 89.85%, respectively.
{"title":"Result identification for biomedical abstracts using Conditional Random Fields","authors":"Ryan T. K. Lin, Hong-Jie Dai, Yue-Yang Bow, Min-Yuh Day, Richard Tzong-Han Tsai, W. Hsu","doi":"10.1109/IRI.2008.4583016","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583016","url":null,"abstract":"For biomedical research, the most important parts of an abstract are the result and conclusion sections. Some journals divide an abstract into several sections so that readers can easily identify those parts, but others do not. We propose a method that can automatically identify the result and conclusion sections of any biomedical abstracts by formulating this identification problem as a sequence labeling task. Three feature sets (Position, Named Entity, and Word Frequency) are employed with Conditional Random Fields (CRFs) as the underlying machine learning model. Experimental results show that the combination of our proposed feature sets can achieve F-measure, precision, and recall scores of 92.50%, 95.32% and 89.85%, respectively.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123088700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583073
Deise de Brum Saccol, Nina Edelweiss, R. Galante, Marcio Roberto de Mello
Peer-to-peer (P2P) systems provide shared access to resources that are spread over the network. In such scenario, files from the same domain can be found in different peers. When the user poses a query, the processing relies mainly on the flooding technique, which is quite inefficient for optimization purposes. To solve this issue, our work proposes to cluster documents from the same domain into super peers. Thus, files related to the same universe of discourse are grouped and the query processing is restricted to a subset of the network. The clustering task involves: ontology generation, document and ontology matching, and metadata management. This paper details the ontology generation task. The proposed mechanism implements the ontology manager in DetVX, a framework for detecting, managing and querying replicas and versions in a P2P context.
{"title":"Managing application domains in P2P systems","authors":"Deise de Brum Saccol, Nina Edelweiss, R. Galante, Marcio Roberto de Mello","doi":"10.1109/IRI.2008.4583073","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583073","url":null,"abstract":"Peer-to-peer (P2P) systems provide shared access to resources that are spread over the network. In such scenario, files from the same domain can be found in different peers. When the user poses a query, the processing relies mainly on the flooding technique, which is quite inefficient for optimization purposes. To solve this issue, our work proposes to cluster documents from the same domain into super peers. Thus, files related to the same universe of discourse are grouped and the query processing is restricted to a subset of the network. The clustering task involves: ontology generation, document and ontology matching, and metadata management. This paper details the ontology generation task. The proposed mechanism implements the ontology manager in DetVX, a framework for detecting, managing and querying replicas and versions in a P2P context.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116691198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1504/ijids.2009.027658
Moshe Itshak, Y. Wiseman
Super-Pages have been wandering around for more than a decade. There are some particular operating systems that support Super-Paging and there are some recent research papers that show interesting ideas how to intelligently integrate them; however, nowadays Operating System’s page replacement mechanism still uses the old Clock algorithm which gives the same priority to small and large pages. In this paper we show a technique that enhances the page replacement mechanism to an algorithm based on more parameters and is suitable for a Super-Paging environment.
{"title":"AMSQM: Adaptive multiple super-page queue management","authors":"Moshe Itshak, Y. Wiseman","doi":"10.1504/ijids.2009.027658","DOIUrl":"https://doi.org/10.1504/ijids.2009.027658","url":null,"abstract":"Super-Pages have been wandering around for more than a decade. There are some particular operating systems that support Super-Paging and there are some recent research papers that show interesting ideas how to intelligently integrate them; however, nowadays Operating System’s page replacement mechanism still uses the old Clock algorithm which gives the same priority to small and large pages. In this paper we show a technique that enhances the page replacement mechanism to an algorithm based on more parameters and is suitable for a Super-Paging environment.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115591098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583062
Cong Zhang, A. Bakshi, V. Prasanna
The management of reservoir simulation models has been an important need of engineers in petroleum industry. However, due to data sharing among reservoir simulation models, data replication is common and poses many challenges to model management, including management efficiency and data consistency. In this paper, we propose a data component based methodology to manage reservoir simulation models. It not only improves management efficiency by removing data replicas, but also facilitates information reuse among multiple models. We first identify the underlying structure of the simulation models and decompose them into three types of components: reservoir realization, design, and simulator configuration. Our methodology then identifies the duplicate components and guarantees that each component has one physical copy in the data repository. By separating the logical connections between the models and the components from the physical data files, our methodology provides a clean and efficient way to manage data sharing relationships among the models.
{"title":"Data component based management of reservoir simulation models","authors":"Cong Zhang, A. Bakshi, V. Prasanna","doi":"10.1109/IRI.2008.4583062","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583062","url":null,"abstract":"The management of reservoir simulation models has been an important need of engineers in petroleum industry. However, due to data sharing among reservoir simulation models, data replication is common and poses many challenges to model management, including management efficiency and data consistency. In this paper, we propose a data component based methodology to manage reservoir simulation models. It not only improves management efficiency by removing data replicas, but also facilitates information reuse among multiple models. We first identify the underlying structure of the simulation models and decompose them into three types of components: reservoir realization, design, and simulator configuration. Our methodology then identifies the duplicate components and guarantees that each component has one physical copy in the data repository. By separating the logical connections between the models and the components from the physical data files, our methodology provides a clean and efficient way to manage data sharing relationships among the models.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114620682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583044
Citlalih Gutierrez Estrada, S. D. Zagal, M. N. Perez, Itzel Abundez Barrera, Rocio Elizabeth Pulido Alba, Mauro Sanchez Sanchez, René Arnulfo García-Hernández
This work focuses on the implementation of a methodology as a strategy for system development analysis in the early stages, starting from system specifications in natural language and ending with modeling of the specified system using different diagrams, with the help of the Unified Modeling Language (UML) to facilitate information reusability and cooperation. The ultimate goal is to guarantee the fulfillment of a given set of needs, ensuring that the system fulfills its design requirements and functionality correctly, making sure that the system conception process is successful from the first stages of development.
{"title":"Analysis methodology for project design utilizing UML","authors":"Citlalih Gutierrez Estrada, S. D. Zagal, M. N. Perez, Itzel Abundez Barrera, Rocio Elizabeth Pulido Alba, Mauro Sanchez Sanchez, René Arnulfo García-Hernández","doi":"10.1109/IRI.2008.4583044","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583044","url":null,"abstract":"This work focuses on the implementation of a methodology as a strategy for system development analysis in the early stages, starting from system specifications in natural language and ending with modeling of the specified system using different diagrams, with the help of the Unified Modeling Language (UML) to facilitate information reusability and cooperation. The ultimate goal is to guarantee the fulfillment of a given set of needs, ensuring that the system fulfills its design requirements and functionality correctly, making sure that the system conception process is successful from the first stages of development.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115110781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}