FFTM:优化的兄弟姊妹软嵌入约束的频繁树挖掘

M. Sghaier, S. Yahia, Anne Laurent, M. Teisseire
{"title":"FFTM:优化的兄弟姊妹软嵌入约束的频繁树挖掘","authors":"M. Sghaier, S. Yahia, Anne Laurent, M. Teisseire","doi":"10.1145/1456223.1456309","DOIUrl":null,"url":null,"abstract":"Databases have become increasingly large and the data they contain is increasingly bulky. Thus the problem of knowledge extraction has become very significant and requires multiple techniques for processing the data available in order to extract the information contained from it. We particularly consider the data available on the web. Regarding the problem of the data exchange on the internet, XML is playing an increasing important role in this issue and has become a dominating standard proposed to deal with huge volumes of electronic documents. We are especially involved in extracting knowledge from complex tree structures such as XML documents.\n As they are heterogeneous and with complex structures, the resources available in such documents present the difficulty of querying these data. In order to deal with this problem, automatic tools are of compelling need. We especially consider the problem of constructing a mediator schema whose role is to give the necassary information about the resources structure and through which the data can be queried. In this paper, we present a new approach, called FFTM, dealing with the problem of schema mining through which we particularly focused on the use of soft embedding concept in order to extract more relevant knowledge. Indeed, crisp methods often discard interesting approximate patterns. For this purpose, we have adopted fuzzy constraints for discovering and validating frequent substructures in a large collection of semi-structured data, where both patterns and the data are modeled by labeled trees. The FFTM approach has been tested and validated on synthetic and XML document databases. The experimental results obtained show that our approach is very relevant and palliates the problem of the crisp approach.","PeriodicalId":309453,"journal":{"name":"International Conference on Soft Computing as Transdisciplinary Science and Technology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"FFTM: optimized frequent tree mining with soft embedding constraints on siblings\",\"authors\":\"M. Sghaier, S. Yahia, Anne Laurent, M. Teisseire\",\"doi\":\"10.1145/1456223.1456309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Databases have become increasingly large and the data they contain is increasingly bulky. Thus the problem of knowledge extraction has become very significant and requires multiple techniques for processing the data available in order to extract the information contained from it. We particularly consider the data available on the web. Regarding the problem of the data exchange on the internet, XML is playing an increasing important role in this issue and has become a dominating standard proposed to deal with huge volumes of electronic documents. We are especially involved in extracting knowledge from complex tree structures such as XML documents.\\n As they are heterogeneous and with complex structures, the resources available in such documents present the difficulty of querying these data. In order to deal with this problem, automatic tools are of compelling need. We especially consider the problem of constructing a mediator schema whose role is to give the necassary information about the resources structure and through which the data can be queried. In this paper, we present a new approach, called FFTM, dealing with the problem of schema mining through which we particularly focused on the use of soft embedding concept in order to extract more relevant knowledge. Indeed, crisp methods often discard interesting approximate patterns. For this purpose, we have adopted fuzzy constraints for discovering and validating frequent substructures in a large collection of semi-structured data, where both patterns and the data are modeled by labeled trees. The FFTM approach has been tested and validated on synthetic and XML document databases. The experimental results obtained show that our approach is very relevant and palliates the problem of the crisp approach.\",\"PeriodicalId\":309453,\"journal\":{\"name\":\"International Conference on Soft Computing as Transdisciplinary Science and Technology\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Soft Computing as Transdisciplinary Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1456223.1456309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Soft Computing as Transdisciplinary Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1456223.1456309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

数据库变得越来越大,它们所包含的数据也越来越庞大。因此,知识提取问题变得非常重要,需要多种技术来处理可用的数据,以便从中提取所包含的信息。我们特别考虑网络上可用的数据。对于internet上的数据交换问题,XML在这一问题中扮演着越来越重要的角色,并已成为处理海量电子文档的主导标准。我们特别关注从复杂的树状结构(如XML文档)中提取知识。由于它们是异构的,结构复杂,这些文档中可用的资源给查询这些数据带来了困难。为了解决这一问题,迫切需要自动化工具。我们特别考虑构建中介模式的问题,中介模式的作用是提供有关资源结构的必要信息,并通过中介模式查询数据。在本文中,我们提出了一种称为FFTM的新方法来处理模式挖掘问题,通过该方法我们特别关注软嵌入概念的使用,以提取更多的相关知识。事实上,简洁的方法经常会抛弃有趣的近似模式。为此,我们采用模糊约束来发现和验证大量半结构化数据中的频繁子结构,其中模式和数据都是通过标记树建模的。FFTM方法已经在合成和XML文档数据库上进行了测试和验证。实验结果表明,该方法具有很强的相关性,解决了脆片法存在的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FFTM: optimized frequent tree mining with soft embedding constraints on siblings
Databases have become increasingly large and the data they contain is increasingly bulky. Thus the problem of knowledge extraction has become very significant and requires multiple techniques for processing the data available in order to extract the information contained from it. We particularly consider the data available on the web. Regarding the problem of the data exchange on the internet, XML is playing an increasing important role in this issue and has become a dominating standard proposed to deal with huge volumes of electronic documents. We are especially involved in extracting knowledge from complex tree structures such as XML documents. As they are heterogeneous and with complex structures, the resources available in such documents present the difficulty of querying these data. In order to deal with this problem, automatic tools are of compelling need. We especially consider the problem of constructing a mediator schema whose role is to give the necassary information about the resources structure and through which the data can be queried. In this paper, we present a new approach, called FFTM, dealing with the problem of schema mining through which we particularly focused on the use of soft embedding concept in order to extract more relevant knowledge. Indeed, crisp methods often discard interesting approximate patterns. For this purpose, we have adopted fuzzy constraints for discovering and validating frequent substructures in a large collection of semi-structured data, where both patterns and the data are modeled by labeled trees. The FFTM approach has been tested and validated on synthetic and XML document databases. The experimental results obtained show that our approach is very relevant and palliates the problem of the crisp approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Novel cache management strategy for semantic caching in mobile environment Evolutionary multiobjective optimization and multiobjective fuzzy system design Network security simulation and evaluation A software based approach for autonomous projectile attitude and position estimation Fatigue level estimation of bill based on feature-selected acoustic energy pattern by using supervised SOM
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1