Pub Date : 1900-01-01DOI: 10.1504/IJKWI.2017.10010794
R. Wadawadagi, V. Pagi
Participating in social networks to create and share opinion content has become a ubiquitous part of our everyday life. Understanding social media content is at the top of the agenda for many firms today. Business analysts and quants are trying harder to discover ways in which enterprises can be benefited by comprehending the content generated through social media such as Facebook, Wikipedia, Blogs, Youtube and Twitter. This pioneering work may aid business analysts and data scientists with insights into ways to adapt the stable content analysis (CA) techniques to analyse web page contents containing user-generated data. In this paper, we develop an integrated enterprise framework that defines web content analysis (WCA) as a comprehensive and functional layered architecture, and consequently, this framework can be used in various levels of the decision-making process. Further, a four dimensional view of comparative analysis of various WCA systems is presented. Based on the critical analysis of the literature survey, the study explores many open and challenging issues for further research in this domain.
{"title":"An enterprise perspective of web content analysis research: a strategic road-map","authors":"R. Wadawadagi, V. Pagi","doi":"10.1504/IJKWI.2017.10010794","DOIUrl":"https://doi.org/10.1504/IJKWI.2017.10010794","url":null,"abstract":"Participating in social networks to create and share opinion content has become a ubiquitous part of our everyday life. Understanding social media content is at the top of the agenda for many firms today. Business analysts and quants are trying harder to discover ways in which enterprises can be benefited by comprehending the content generated through social media such as Facebook, Wikipedia, Blogs, Youtube and Twitter. This pioneering work may aid business analysts and data scientists with insights into ways to adapt the stable content analysis (CA) techniques to analyse web page contents containing user-generated data. In this paper, we develop an integrated enterprise framework that defines web content analysis (WCA) as a comprehensive and functional layered architecture, and consequently, this framework can be used in various levels of the decision-making process. Further, a four dimensional view of comparative analysis of various WCA systems is presented. Based on the critical analysis of the literature survey, the study explores many open and challenging issues for further research in this domain.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124769486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/IJKWI.2016.10005796
Rachna Miglani
In this world of specialisation where everything is getting specialised, data warehouses and web mining techniques are also getting specialised. Web usage mining, web content mining, and web structure mining are various categories of web mining techniques depending upon the data to be mined. Apriori algorithm, FP growth algorithm, and average linear time algorithm are available to analyse the general access patterns in web server logs whereas WCOND-mine and signed with weight technique are web content outlier mining algorithms. However, no such algorithm is available to check the authenticity and availability of hyperlinks in the resultant web pages given by web search engines. The present research work aims at detection of outliers from the results of queries over web pages through web search engines.
在这个一切都变得专业化的世界里,数据仓库和网络挖掘技术也变得专业化。Web使用挖掘、Web内容挖掘和Web结构挖掘是Web挖掘技术的不同类别,这取决于要挖掘的数据。Apriori算法、FP增长算法和平均线性时间算法可用于分析web服务器日志中的一般访问模式,而WCOND-mine和signed with weight技术是web内容离群值挖掘算法。然而,没有这样的算法是可用的,以检查的真实性和可用性的结果网页上的超链接由网络搜索引擎给出。目前的研究工作旨在通过网络搜索引擎从网页查询结果中检测异常值。
{"title":"WSOLINK: web structure outlier detection algorithm","authors":"Rachna Miglani","doi":"10.1504/IJKWI.2016.10005796","DOIUrl":"https://doi.org/10.1504/IJKWI.2016.10005796","url":null,"abstract":"In this world of specialisation where everything is getting specialised, data warehouses and web mining techniques are also getting specialised. Web usage mining, web content mining, and web structure mining are various categories of web mining techniques depending upon the data to be mined. Apriori algorithm, FP growth algorithm, and average linear time algorithm are available to analyse the general access patterns in web server logs whereas WCOND-mine and signed with weight technique are web content outlier mining algorithms. However, no such algorithm is available to check the authenticity and availability of hyperlinks in the resultant web pages given by web search engines. The present research work aims at detection of outliers from the results of queries over web pages through web search engines.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"378 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121764286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/IJKWI.2017.10010171
K. R. Kumar, D. T. Santosh, B. V. Vardhan
Opinion words express the information regarding the like and dislike of a user on the target entities such as products and product aspects present in the online reviews. The polarised information collected from the reviews is analysed by calculating the orientation of the adjectives. The synonymy relation graph is a way to determine the orientation of the adjectives present in the product reviews dataset. It considers the minimum path length between the adjectives under analysis using WordNet synsets. The synonymy relation graph cannot determine the orientations of all the opinion words present in the dataset. In order to evaluate opinion orientation of all the adjectives from the dataset, the synonymy relation graph of WordNet is to be replaced with the SentiWordNet scores of the opinion words. These scores are provided to the opinion words by finding the contextual clues surrounding the opinion words to disambiguate their sense. The contextual clues are finalised based on the typed dependencies grammatical relations. The distance between the opinion word and the context insensitive seed term (good/bad) is computed by calculating the difference between these scores. This paper addresses advantages of using SentiWordNet scores. This improves the accuracy of the determined opinion word orientations.
{"title":"Determining the semantic orientation of opinion words using typed dependencies for opinion word senses and SentiWordNet scores from online product reviews","authors":"K. R. Kumar, D. T. Santosh, B. V. Vardhan","doi":"10.1504/IJKWI.2017.10010171","DOIUrl":"https://doi.org/10.1504/IJKWI.2017.10010171","url":null,"abstract":"Opinion words express the information regarding the like and dislike of a user on the target entities such as products and product aspects present in the online reviews. The polarised information collected from the reviews is analysed by calculating the orientation of the adjectives. The synonymy relation graph is a way to determine the orientation of the adjectives present in the product reviews dataset. It considers the minimum path length between the adjectives under analysis using WordNet synsets. The synonymy relation graph cannot determine the orientations of all the opinion words present in the dataset. In order to evaluate opinion orientation of all the adjectives from the dataset, the synonymy relation graph of WordNet is to be replaced with the SentiWordNet scores of the opinion words. These scores are provided to the opinion words by finding the contextual clues surrounding the opinion words to disambiguate their sense. The contextual clues are finalised based on the typed dependencies grammatical relations. The distance between the opinion word and the context insensitive seed term (good/bad) is computed by calculating the difference between these scores. This paper addresses advantages of using SentiWordNet scores. This improves the accuracy of the determined opinion word orientations.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132032229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1504/IJKWI.2011.045163
R. Dutta, A. Kundu, Debajyoti Mukhopadhyay
Web page prediction plays an important role by predicting and fetching probable web page of next request in advance, resulting in reducing the user latency. The users surf the internet either by entering URL or search for some topic or through link of same topic. For searching and for link prediction, clustering plays an important role. Besides the topic, navigational behaviour is not ignored. This paper proposes a web page prediction model giving significant importance to the user's interest using the clustering technique and the navigational behaviour of the user through Markov model. The clustering technique is used for the accumulation of the similar web pages. Similar web pages of same type reside in the same cluster, the cluster containing web pages have the similarity with respect to topic of the session. The clustering algorithms considered are K-means and K-mediods, where K is determined by HITS algorithm. Finally, the predicted web pages are stored in form of cellular automata to make the system more memory efficient.
{"title":"Clustering-based web page prediction","authors":"R. Dutta, A. Kundu, Debajyoti Mukhopadhyay","doi":"10.1504/IJKWI.2011.045163","DOIUrl":"https://doi.org/10.1504/IJKWI.2011.045163","url":null,"abstract":"Web page prediction plays an important role by predicting and fetching probable web page of next request in advance, resulting in reducing the user latency. The users surf the internet either by entering URL or search for some topic or through link of same topic. For searching and for link prediction, clustering plays an important role. Besides the topic, navigational behaviour is not ignored. This paper proposes a web page prediction model giving significant importance to the user's interest using the clustering technique and the navigational behaviour of the user through Markov model. The clustering technique is used for the accumulation of the similar web pages. Similar web pages of same type reside in the same cluster, the cluster containing web pages have the similarity with respect to topic of the session. The clustering algorithms considered are K-means and K-mediods, where K is determined by HITS algorithm. Finally, the predicted web pages are stored in form of cellular automata to make the system more memory efficient.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127702951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}