Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356801
Zhongming Han, Qian Mo, Hongzhi Liu, Jianzhi Sun
There are a lot of redundant web pages on Internet. Based on tag statistic and text similarity comparison, we present a novel multilayer framework for detecting duplicated web pages in this paper. We propose two similarity text paragraphs detection algorithms and implement our framework. The experimental results show that our approach achieves high performance, which means that duplicated web pages can be efficiently detected simply by tag statistic and text comparison.
{"title":"Effectively and efficiently detect web page duplication","authors":"Zhongming Han, Qian Mo, Hongzhi Liu, Jianzhi Sun","doi":"10.1109/ICDIM.2009.5356801","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356801","url":null,"abstract":"There are a lot of redundant web pages on Internet. Based on tag statistic and text similarity comparison, we present a novel multilayer framework for detecting duplicated web pages in this paper. We propose two similarity text paragraphs detection algorithms and implement our framework. The experimental results show that our approach achieves high performance, which means that duplicated web pages can be efficiently detected simply by tag statistic and text comparison.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114808208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356770
Chakaveh Saedi, M. Shamsfard
The necessity of machine translation systems is growing rapidly due to the increase of documents and translation requests. Reading web pages, news, articles, manuals and users' guides and getting the gist of a test are some of the examples for our daily need to translation. This paper introduces a Persian to English machine translation system, named PEnT2. It uses a transfer based approach and employs the grammatical role of the sentence words as the main clue to perform the translation processes. PEnT2 translates simple Persian sentences into English using a hybrid approach. It exploits the advantages of rule based, knowledge based and corpus based methods in different components of a machine translation system including word sense disambiguation, structural transfer and structure optimization. Experiments show improved results in comparison with other available systems.
{"title":"Translating Persian documents into English using knowledge based WSD","authors":"Chakaveh Saedi, M. Shamsfard","doi":"10.1109/ICDIM.2009.5356770","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356770","url":null,"abstract":"The necessity of machine translation systems is growing rapidly due to the increase of documents and translation requests. Reading web pages, news, articles, manuals and users' guides and getting the gist of a test are some of the examples for our daily need to translation. This paper introduces a Persian to English machine translation system, named PEnT2. It uses a transfer based approach and employs the grammatical role of the sentence words as the main clue to perform the translation processes. PEnT2 translates simple Persian sentences into English using a hybrid approach. It exploits the advantages of rule based, knowledge based and corpus based methods in different components of a machine translation system including word sense disambiguation, structural transfer and structure optimization. Experiments show improved results in comparison with other available systems.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122587552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356765
Dora Alvarez-Medina, H. Hidalgo-Silva
Probabilistic text data modeling is usually considered with Bernoulli or multinomial event models. The main problem of text mining is the large amount of zero account in the matrix representation. Recently a document visualization technique incorporating the Zero Inflated Poisson model in the Generative Topographic Mapping algorithm has been proposed. This probabilistic model can be applied as a text document visualization tool. In this work, an algorithm for automatically extracting the clusters in the visualization results is presented. The combination of visualization-cluster extraction algorithms allows to obtain and evaluate document collections. Several results are presented for 20-Newsgroups and Reuters data.
{"title":"Document cluster detection on latent projections","authors":"Dora Alvarez-Medina, H. Hidalgo-Silva","doi":"10.1109/ICDIM.2009.5356765","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356765","url":null,"abstract":"Probabilistic text data modeling is usually considered with Bernoulli or multinomial event models. The main problem of text mining is the large amount of zero account in the matrix representation. Recently a document visualization technique incorporating the Zero Inflated Poisson model in the Generative Topographic Mapping algorithm has been proposed. This probabilistic model can be applied as a text document visualization tool. In this work, an algorithm for automatically extracting the clusters in the visualization results is presented. The combination of visualization-cluster extraction algorithms allows to obtain and evaluate document collections. Several results are presented for 20-Newsgroups and Reuters data.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121173484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356790
A. Kim, O. Gwun, Juwhan Song
This paper proposes the feature descriptor for 3D model similarity search using the distribution of normal directions on the simplified surface. Feature descriptor of 3D model should be invariant to translation, rotation and scale for its model. So this paper normalizes all the model using PCA and preprocesses surface mesh simplification to robust against noise. The normal is sampled in proportion to each polygon's area and then it is calculated by weight average method via angles and interpolated. We implemented the 3D model retrieval system and performed the similarity search test with the shape bench mark data provided by the Princeton University. Experimental results show the performance improvement of proposed algorithm from 24.7% to 32.2% in comparison with conventional methods by ANMRR.
{"title":"Feature based similarity search in simplified surface 3D model using interpolation method","authors":"A. Kim, O. Gwun, Juwhan Song","doi":"10.1109/ICDIM.2009.5356790","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356790","url":null,"abstract":"This paper proposes the feature descriptor for 3D model similarity search using the distribution of normal directions on the simplified surface. Feature descriptor of 3D model should be invariant to translation, rotation and scale for its model. So this paper normalizes all the model using PCA and preprocesses surface mesh simplification to robust against noise. The normal is sampled in proportion to each polygon's area and then it is calculated by weight average method via angles and interpolated. We implemented the 3D model retrieval system and performed the similarity search test with the shape bench mark data provided by the Princeton University. Experimental results show the performance improvement of proposed algorithm from 24.7% to 32.2% in comparison with conventional methods by ANMRR.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356779
M. Raza, T. Kirkham, R. Harrison, Quentin Hugues Reul
Western manufacturing is under pressure to produce high quality customized products particularly in large manufacturing such as the car industry at low costs. Here the development of such product customization requires the adoption of innovative agile manufacturing techniques. To date this innovation has focused on the improved process development between the different stages of manufacturing Product Lifecycle Management (PLM). However in terms of implementing the application, data management techniques have lagged behind often leaving these processes disjointed and lacking in automation. This paper proposes an improved model based on innovation in the manufacturing PLM. Building on existing work in the use of ontologies for knowledge management, the paper applies these techniques to PLM. The implementation has been applied to develop a case study around a Ford production line. The prototype presents an innovative approach to PLM and tested using a state of the art Web Service infrastructure implemented on a Ford Powertrain test rig.
{"title":"Improving manufacturing efficiency at ford using product centred knowledge management","authors":"M. Raza, T. Kirkham, R. Harrison, Quentin Hugues Reul","doi":"10.1109/ICDIM.2009.5356779","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356779","url":null,"abstract":"Western manufacturing is under pressure to produce high quality customized products particularly in large manufacturing such as the car industry at low costs. Here the development of such product customization requires the adoption of innovative agile manufacturing techniques. To date this innovation has focused on the improved process development between the different stages of manufacturing Product Lifecycle Management (PLM). However in terms of implementing the application, data management techniques have lagged behind often leaving these processes disjointed and lacking in automation. This paper proposes an improved model based on innovation in the manufacturing PLM. Building on existing work in the use of ontologies for knowledge management, the paper applies these techniques to PLM. The implementation has been applied to develop a case study around a Ford production line. The prototype presents an innovative approach to PLM and tested using a state of the art Web Service infrastructure implemented on a Ford Powertrain test rig.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127859964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356800
Kang-Che Lee, M. Shan
Bulletin Board System (BBS) is very popular and provide an asynchronous, text-based environment for users to exchange information and idea. A BBS consists of a number of discussion boards, each of which focuses on a particular subject. A discussion on a topic consists of a seed articles followed by some articles responsive to the seed article or other responsive articles. This paper investigates the social community analysis technique to discover the political tendency of users within the boards from discussions. We first extract the social interactions between users, such as "reply" and "advocate" of posts between users. A social network among users is constructed based on the extracted social interaction. After building the social network, we employ the graph partition, graph coloring, and graph clustering algorithms respectively to discover the social communities. Users of the same community have more potential of political opinion agreement with each other. By using this approach, we are able to partition users into two opposite groups and identify their political tendency effectively without linguistic analysis of discussion content.
{"title":"Discovering political tendency in bulletin board discussions by social community analysis","authors":"Kang-Che Lee, M. Shan","doi":"10.1109/ICDIM.2009.5356800","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356800","url":null,"abstract":"Bulletin Board System (BBS) is very popular and provide an asynchronous, text-based environment for users to exchange information and idea. A BBS consists of a number of discussion boards, each of which focuses on a particular subject. A discussion on a topic consists of a seed articles followed by some articles responsive to the seed article or other responsive articles. This paper investigates the social community analysis technique to discover the political tendency of users within the boards from discussions. We first extract the social interactions between users, such as \"reply\" and \"advocate\" of posts between users. A social network among users is constructed based on the extracted social interaction. After building the social network, we employ the graph partition, graph coloring, and graph clustering algorithms respectively to discover the social communities. Users of the same community have more potential of political opinion agreement with each other. By using this approach, we are able to partition users into two opposite groups and identify their political tendency effectively without linguistic analysis of discussion content.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"98 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128003259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356794
N. Xu, K. Subbu, Shijun Tang
Widely utilized for numerous applications, wireless sensor networks have become a boon to the academia and industrial communities. However, the stringent energy constraints place a hurdle to the continuous functioning of the sensing devices. Clustering a network leads to a lesser number of nodes participating in active transmissions. Data aggregation reduces redundant packet sending. Given that transmission is the primary energy consuming activity for a sensor mote, this paper exploits the joint advantages of clustering and data aggregation in decreasing communication cost. A data centric analysis was performed to justify the use of data aggregation. Experimental results on a realistic platform with MICAz motes running TinyOS embedded system showed 22% energy savings and 13% overhead reduction, confirming the attractive advantages of data aggregation and clustering such as more efficient transmissions.
{"title":"Confounded factor effects on battery life in wireless sensor networks","authors":"N. Xu, K. Subbu, Shijun Tang","doi":"10.1109/ICDIM.2009.5356794","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356794","url":null,"abstract":"Widely utilized for numerous applications, wireless sensor networks have become a boon to the academia and industrial communities. However, the stringent energy constraints place a hurdle to the continuous functioning of the sensing devices. Clustering a network leads to a lesser number of nodes participating in active transmissions. Data aggregation reduces redundant packet sending. Given that transmission is the primary energy consuming activity for a sensor mote, this paper exploits the joint advantages of clustering and data aggregation in decreasing communication cost. A data centric analysis was performed to justify the use of data aggregation. Experimental results on a realistic platform with MICAz motes running TinyOS embedded system showed 22% energy savings and 13% overhead reduction, confirming the attractive advantages of data aggregation and clustering such as more efficient transmissions.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124953373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356762
S. Lim
Given a large, dense transaction database, generating interesting frequent patterns in a user friendly manner remains as an important issue in data mining. It is because the minimum support, the most popular statistical significance measurement, is not capable of reflecting the domain user's interest. This paper presents visual frequent itemset mining (VFIM) as an alternative to the traditional apriori-like frequent itemset mining. VFIM pushes the domain user's cognitive power into the data mining process. To this end, a formal visual data mining model is proposed and a prototype of the model is created. The effectiveness of the proposed model is demonstrated by showing that VFIM generates frequent patterns, by means of user interaction, that are compatible with those generated by traditional apriori-like algorithms without executing them.
{"title":"On a visual frequent itemset mining","authors":"S. Lim","doi":"10.1109/ICDIM.2009.5356762","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356762","url":null,"abstract":"Given a large, dense transaction database, generating interesting frequent patterns in a user friendly manner remains as an important issue in data mining. It is because the minimum support, the most popular statistical significance measurement, is not capable of reflecting the domain user's interest. This paper presents visual frequent itemset mining (VFIM) as an alternative to the traditional apriori-like frequent itemset mining. VFIM pushes the domain user's cognitive power into the data mining process. To this end, a formal visual data mining model is proposed and a prototype of the model is created. The effectiveness of the proposed model is demonstrated by showing that VFIM generates frequent patterns, by means of user interaction, that are compatible with those generated by traditional apriori-like algorithms without executing them.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356766
YeSun Joung, M. Zarki, R. Jain
A user model is essential to support personalization services. However, the user models that are used in most systems are designed in an ad-hoc manner and often related to their application domains. It hinders service interoperability and increases the amount of work. In this paper, we focus on defining a general user model in order to represent user information and the user context. This paper contributes two points: a survey on existing context based systems and a user model for personalization. The survey analyses previous context based systems and provides different kinds of features used in their systems. Based on this survey, we propose a user model to capture user information and contexts for personalization application.
{"title":"A user model for personalization services","authors":"YeSun Joung, M. Zarki, R. Jain","doi":"10.1109/ICDIM.2009.5356766","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356766","url":null,"abstract":"A user model is essential to support personalization services. However, the user models that are used in most systems are designed in an ad-hoc manner and often related to their application domains. It hinders service interoperability and increases the amount of work. In this paper, we focus on defining a general user model in order to represent user information and the user context. This paper contributes two points: a survey on existing context based systems and a user model for personalization. The survey analyses previous context based systems and provides different kinds of features used in their systems. Based on this survey, we propose a user model to capture user information and contexts for personalization application.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133579735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-18DOI: 10.1109/ICDIM.2009.5356777
F. F. F. Peres, R. Mello
Data interchange between different computer systems is a common task today, and the use of XML as an exchanging protocol has increasing. XML Schema recommendation allows the definition of an XML structure to be used for applications that communicate data each other. This paper proposes a rule-based approach for converting object-oriented (OO) database schemata to XML Schema schemata, as well as an algorithm that defines the application of the rules. We consider a schema mapping process in the OO→XML direction based on a detailed analysis of the OO database model concepts. A prototype tool was implemented to validate the process. Compared to related work, our proposal considers the mapping of all OODB model concepts to equivalent data structures in XML.
{"title":"A rule-based conversion of an object-oriented database schema to a schema in XML schema","authors":"F. F. F. Peres, R. Mello","doi":"10.1109/ICDIM.2009.5356777","DOIUrl":"https://doi.org/10.1109/ICDIM.2009.5356777","url":null,"abstract":"Data interchange between different computer systems is a common task today, and the use of XML as an exchanging protocol has increasing. XML Schema recommendation allows the definition of an XML structure to be used for applications that communicate data each other. This paper proposes a rule-based approach for converting object-oriented (OO) database schemata to XML Schema schemata, as well as an algorithm that defines the application of the rules. We consider a schema mapping process in the OO→XML direction based on a detailed analysis of the OO database model concepts. A prototype tool was implemented to validate the process. Compared to related work, our proposal considers the mapping of all OODB model concepts to equivalent data structures in XML.","PeriodicalId":300287,"journal":{"name":"2009 Fourth International Conference on Digital Information Management","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133004445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}