A formal, web-based representation of social networks is both a necessity in terms of infrastructure as well as a prominent application for the Semantic Web. In this paper we present three advances in exploiting the opportunity of semantically-enriched network data: (1) an ontology for the representation of social networks and relationships (2) a hybrid system for online data acquisition that combines traditional web mining techniques with the collection of Semantic Web data (2) a case study highlighting some of the possible analysis of this data using methods from Social Network Analysis, the branch of sociology concerned with relational data.
{"title":"Social Networks and the Semantic Web","authors":"P. Mika","doi":"10.1109/WI.2004.128","DOIUrl":"https://doi.org/10.1109/WI.2004.128","url":null,"abstract":"A formal, web-based representation of social networks is both a necessity in terms of infrastructure as well as a prominent application for the Semantic Web. In this paper we present three advances in exploiting the opportunity of semantically-enriched network data: (1) an ontology for the representation of social networks and relationships (2) a hybrid system for online data acquisition that combines traditional web mining techniques with the collection of Semantic Web data (2) a case study highlighting some of the possible analysis of this data using methods from Social Network Analysis, the branch of sociology concerned with relational data.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current search engines maintain a local repository to improve the search efficiency. A crawler is used to periodically poll the remote web pages to update the contents of the local repository. Due to the resource limitations, some local pages may be stale. To maintain the high freshness of the repository, the crawler is expected to revisit remote web pages in optimized order and frequency. The intuitive metric of freshness of the local repository is defined as the fraction of up-to-date web pages in the repository, which is merely based on the repository content, and does not, unfortunately, reflect the perspective of the search engine users, e.g., how often is a web page queried? We propose a novel weighted metric of the repository freshness with the importance of web pages being the weights. This metric not only takes into account the local web pages themselves but also the perspectives of the search engine users. We study the repository synchronization policy under this new metric, compare this metric with others, analyze its features, and discuss how the web page importance is determined.
{"title":"A Weighted Freshness Metric for Maintaining Search Engine Local Repository","authors":"Jianchao Han, N. Cercone, Xiaohua Hu","doi":"10.1109/WI.2004.17","DOIUrl":"https://doi.org/10.1109/WI.2004.17","url":null,"abstract":"Current search engines maintain a local repository to improve the search efficiency. A crawler is used to periodically poll the remote web pages to update the contents of the local repository. Due to the resource limitations, some local pages may be stale. To maintain the high freshness of the repository, the crawler is expected to revisit remote web pages in optimized order and frequency. The intuitive metric of freshness of the local repository is defined as the fraction of up-to-date web pages in the repository, which is merely based on the repository content, and does not, unfortunately, reflect the perspective of the search engine users, e.g., how often is a web page queried? We propose a novel weighted metric of the repository freshness with the importance of web pages being the weights. This metric not only takes into account the local web pages themselves but also the perspectives of the search engine users. We study the repository synchronization policy under this new metric, compare this metric with others, analyze its features, and discuss how the web page importance is determined.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diamanto Oikonomopoulou, Maria Rigou, S. Sirmakessis, A. Tsakalidis
Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based on the extraction of sequential navigation patterns from server log files, combined with web site topology. Traversed paths are monitored, internally recorded and cleaned before being completed with cashed page views. After session and episode identification follows the construction of n-grams. Prediction is based upon a 5 + n-gram schema with all lower level n-grams participating, a procedure that resembles the construction of an All 5th-order Markov Model. The schema achieves full coverage while maintaining competitive prediction precision.
{"title":"Full-Coverage Web Prediction based on Web Usage Mining and Site Topology","authors":"Diamanto Oikonomopoulou, Maria Rigou, S. Sirmakessis, A. Tsakalidis","doi":"10.1109/WI.2004.71","DOIUrl":"https://doi.org/10.1109/WI.2004.71","url":null,"abstract":"Understanding and modeling user online behavior, as well as predicting future requests remain an open challenge for researchers, analysts and marketers. In this paper, we propose an efficient prediction schema based on the extraction of sequential navigation patterns from server log files, combined with web site topology. Traversed paths are monitored, internally recorded and cleaned before being completed with cashed page views. After session and episode identification follows the construction of n-grams. Prediction is based upon a 5 + n-gram schema with all lower level n-grams participating, a procedure that resembles the construction of an All 5th-order Markov Model. The schema achieves full coverage while maintaining competitive prediction precision.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116922502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper discusses the approach taken by the Rule Markup Language (RuleML) Initiative towards a general Web rule language framework and relates it to the MDA and UML by the Object Management Group (OMG). It also presents the abstract syntax of RuleML 0.85 as a MOF/UML model and considers the possibility to integrate RuleML with OCL and Action Semantics.
{"title":"The Abstract Syntax of RuleML - Towards a General Web Rule Language Framework","authors":"Gerd Wagner, G. Antoniou, Said Tabet, H. Boley","doi":"10.1109/WI.2004.134","DOIUrl":"https://doi.org/10.1109/WI.2004.134","url":null,"abstract":"This paper discusses the approach taken by the Rule Markup Language (RuleML) Initiative towards a general Web rule language framework and relates it to the MDA and UML by the Object Management Group (OMG). It also presents the abstract syntax of RuleML 0.85 as a MOF/UML model and considers the possibility to integrate RuleML with OCL and Action Semantics.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125953307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are interested in the design of policies for virtual communities of agents based on the grid infrastructure. In a virtual community agents can play both the role of resource consumers and the role of resource providers, and they remain in control of their resources. We argue that this requirement creates a distinction between two dimensions: global vs local and centralized and decentralized control by means of policies. The providers should be enabled to specify their local policies on their own resources, but their policies should be consistent with the global policies. At the same time, some aspects of the decentralized control should be delegated to specialized providers; this delegation requires a distinction between the authorization to access a resource and a permission to do so.
{"title":"Local vs Global Policies and Centralized vs Decentralized Control in Virtual Communities of Agents","authors":"G. Boella, Leendert van der Torre","doi":"10.1109/WI.2004.89","DOIUrl":"https://doi.org/10.1109/WI.2004.89","url":null,"abstract":"We are interested in the design of policies for virtual communities of agents based on the grid infrastructure. In a virtual community agents can play both the role of resource consumers and the role of resource providers, and they remain in control of their resources. We argue that this requirement creates a distinction between two dimensions: global vs local and centralized and decentralized control by means of policies. The providers should be enabled to specify their local policies on their own resources, but their policies should be consistent with the global policies. At the same time, some aspects of the decentralized control should be delegated to specialized providers; this delegation requires a distinction between the authorization to access a resource and a permission to do so.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126028006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saravadee Sae Tan, Gan Keng Hoon, E. Tang, Cheong Sook Lin, Chan Siew Lin, Foo Wen Ying
Having broad coverage of search results returned by various search sources, combining and organizing these results in a meaningful way has become a common issue in the field of information retrieval. In this demo paper, we describe our meta search system, MICE, that is able to aggregate and classify search results based on user-customized categories. Categories help user to focus on search results, with respect to the categories concept customized by the user.
{"title":"MICE: Aggregating and Classifying Meta Search Results into Self-Customized Categories","authors":"Saravadee Sae Tan, Gan Keng Hoon, E. Tang, Cheong Sook Lin, Chan Siew Lin, Foo Wen Ying","doi":"10.1109/WI.2004.96","DOIUrl":"https://doi.org/10.1109/WI.2004.96","url":null,"abstract":"Having broad coverage of search results returned by various search sources, combining and organizing these results in a meaningful way has become a common issue in the field of information retrieval. In this demo paper, we describe our meta search system, MICE, that is able to aggregate and classify search results based on user-customized categories. Categories help user to focus on search results, with respect to the categories concept customized by the user.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123738968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge personalization is currently the most investigated issue in the context of service-oriented systems on the Web. Knowledge representation and management are the critical issues for knowledge personalization, and actually are currently being widely investigated, mainly due to the explosion of data modeling technologies such as XML and XML Schema. Despite some progress, a widely approved standard for delivering knowledge is still missing. In this paper we propose a new approach for representing, managing, and delivering knowledge on the Web and the correspondent framework, called Distributed Knowledge Networks (DKN), that implements it. We also provide a reference architecture for DKN and some experimental results about knowledge personalization.
{"title":"Knowledge on the Web: Making Web Services Knowledge-Aware","authors":"A. Cuzzocrea","doi":"10.1109/WI.2004.87","DOIUrl":"https://doi.org/10.1109/WI.2004.87","url":null,"abstract":"Knowledge personalization is currently the most investigated issue in the context of service-oriented systems on the Web. Knowledge representation and management are the critical issues for knowledge personalization, and actually are currently being widely investigated, mainly due to the explosion of data modeling technologies such as XML and XML Schema. Despite some progress, a widely approved standard for delivering knowledge is still missing. In this paper we propose a new approach for representing, managing, and delivering knowledge on the Web and the correspondent framework, called Distributed Knowledge Networks (DKN), that implements it. We also provide a reference architecture for DKN and some experimental results about knowledge personalization.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125351880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The objective of the WIC Japan Research Centre is to carry out basic research concerning certain aspects of Web Intelligence (WI). Our research activities focus on investigating WI technologies for developing various portals that enable intelligence for e-science, e-business, e-government, and e-learning, as well as deal with the scalability and complexity of real world, efficiently and effectively. We observe that developing intelligent portals is one of the most sophisticated applications, which needs to be supported by WI technologies. Research work that has been carried out can be categorized and described as follows.
{"title":"Using WI Technologies to Develop Intelligent Portals - Research Activities at the WIC Japan Center -","authors":"N. Zhong","doi":"10.1109/WI.2004.156","DOIUrl":"https://doi.org/10.1109/WI.2004.156","url":null,"abstract":"The objective of the WIC Japan Research Centre is to carry out basic research concerning certain aspects of Web Intelligence (WI). Our research activities focus on investigating WI technologies for developing various portals that enable intelligence for e-science, e-business, e-government, and e-learning, as well as deal with the scalability and complexity of real world, efficiently and effectively. We observe that developing intelligent portals is one of the most sophisticated applications, which needs to be supported by WI technologies. Research work that has been carried out can be categorized and described as follows.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122824097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Link context has been exploited extensively ever since the advent of the World Wide Web, but the approach to extracting precise link context has not been fully explored and many state-of-the-art extraction methods are based on simplistic heuristics and require ad-hoc parameters. In this paper, we propose a novel two-step extraction model, which aims to systematically derive link context of quality as high as anchor text. In the macroscopic analysis step, a systematic web page structure analysis is performed to locate the content cohesive text region and potential relevant header or header like tags. In the microscopic extraction step, an English parser is used to extract the relevant sentence fragments in the text region and the nearest heading text is encompassed if the need arises. Preliminary experimental results proved our approach's effectiveness.
{"title":"Extracting Precise Link Context Using NLP Parsing Technique","authors":"Qingyang Xu, Wanli Zuo","doi":"10.1109/WI.2004.68","DOIUrl":"https://doi.org/10.1109/WI.2004.68","url":null,"abstract":"Link context has been exploited extensively ever since the advent of the World Wide Web, but the approach to extracting precise link context has not been fully explored and many state-of-the-art extraction methods are based on simplistic heuristics and require ad-hoc parameters. In this paper, we propose a novel two-step extraction model, which aims to systematically derive link context of quality as high as anchor text. In the macroscopic analysis step, a systematic web page structure analysis is performed to locate the content cohesive text region and potential relevant header or header like tags. In the microscopic extraction step, an English parser is used to extract the relevant sentence fragments in the text region and the nearest heading text is encompassed if the need arises. Preliminary experimental results proved our approach's effectiveness.","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although search engines are playing a crucial role for the retrieval of information from the Web, they cannot guarantee the quality required for most relevant business activities as well as for many top-level research projects. In this paper we present MumbleSearch, a Web Content Monitor which is especially conceived to extract and organize topic-based information with emphasis on quality requirements. We present the architecture of the software platform and its deployment for a real-world application, involving Italian Small and Medium Enterprises (SME).
{"title":"MumbleSearch Extraction of High Quality Web information for SME","authors":"N. Baldini, M. Gori, Marco Maggini","doi":"10.1109/WI.2004.102","DOIUrl":"https://doi.org/10.1109/WI.2004.102","url":null,"abstract":"Although search engines are playing a crucial role for the retrieval of information from the Web, they cannot guarantee the quality required for most relevant business activities as well as for many top-level research projects. In this paper we present MumbleSearch, a Web Content Monitor which is especially conceived to extract and organize topic-based information with emphasis on quality requirements. We present the architecture of the software platform and its deployment for a real-world application, involving Italian Small and Medium Enterprises (SME).","PeriodicalId":229107,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'04)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129148676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}