首页 > 最新文献

22nd International Conference on Data Engineering Workshops (ICDEW'06)最新文献

英文 中文
Category-based Functional Information Modeling for eChronicles 基于分类的编年史功能信息建模
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.38
Pilho Kim, R. Jain
In this paper, a category-based information model is introduced for eChronicles. It features the use of an e-node to represent the identity of information and uses categorized relationships to represent the relations of grouped information sets while preserving their internal data set structures. Our approach separates a data set and its symbolic objects by introducing an e-node between them and merging those pairs through categorical transformation. Our model also supports a functional system representation using functors and natural transformation in category theory to handle complex information processing and to handle complex information processing and the relationships between functions in a canonical way. We demonstrate our theory by converting scattered heterogeneous information into structured data usable by eChronicles. In this paper our focus is on presenting the theoretical framework that we are developing to represent heterogeneous data in way that allows preservation of essential characteristics of data and the processes used to extract symbols from the data.
本文介绍了一种基于分类的编年史信息模型。它的特点是使用e-node来表示信息的身份,并使用分类关系来表示分组信息集的关系,同时保留其内部数据集结构。我们的方法通过在数据集和符号对象之间引入e节点并通过分类转换合并这些对来分离数据集和符号对象。我们的模型还支持使用范畴论中的函子和自然变换的功能系统表示来处理复杂的信息处理,并以规范的方式处理复杂的信息处理和函数之间的关系。我们通过将分散的异构信息转换为ecronicles可用的结构化数据来证明我们的理论。在本文中,我们的重点是展示我们正在开发的理论框架,以允许保留数据的基本特征和用于从数据中提取符号的过程的方式来表示异构数据。
{"title":"Category-based Functional Information Modeling for eChronicles","authors":"Pilho Kim, R. Jain","doi":"10.1109/ICDEW.2006.38","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.38","url":null,"abstract":"In this paper, a category-based information model is introduced for eChronicles. It features the use of an e-node to represent the identity of information and uses categorized relationships to represent the relations of grouped information sets while preserving their internal data set structures. Our approach separates a data set and its symbolic objects by introducing an e-node between them and merging those pairs through categorical transformation. Our model also supports a functional system representation using functors and natural transformation in category theory to handle complex information processing and to handle complex information processing and the relationships between functions in a canonical way. We demonstrate our theory by converting scattered heterogeneous information into structured data usable by eChronicles. In this paper our focus is on presenting the theoretical framework that we are developing to represent heterogeneous data in way that allows preservation of essential characteristics of data and the processes used to extract symbols from the data.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123743080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Multidimensional Trajectories based on Shape and Velocity 基于形状和速度的多维轨迹聚类
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.39
Y. Yanagisawa, T. Satoh
Recently, the analysis of moving objects has become one of the most important technologies to be used in various applications such as GIS, navigation systems, and locationbased information systems, Existing geographic analysis approaches are based on points where each object is located at a certain time. These techniques can extract interesting motion patterns from each moving object, but they can not extract relative motion patterns from many moving objects. Therefore, to retrieve moving objects with a similar trajectory shape to another given moving object, we propose queries based on the similarity between the shapes of moving object trajectories. Our proposed technique can find trajectories whose shape is similar to a certain given trajectory. We define the shape-based similarity query trajectories as an extension of similarity queries for time series data, and then we propose a new clustering technique based on similarity by combining both velocities of moving objects and shapes of objects. Moreover, we show the effectiveness of our proposed clustering method through a performance study using moving rickshaw data.
近年来,运动物体的分析已成为地理信息系统、导航系统和基于位置的信息系统等各种应用中最重要的技术之一。现有的地理分析方法是基于每个物体在特定时间所处的点。这些技术可以从每个运动物体中提取出有趣的运动模式,但不能从许多运动物体中提取出相对的运动模式。因此,为了检索与另一个给定运动物体具有相似轨迹形状的运动物体,我们提出了基于运动物体轨迹形状之间相似性的查询。我们提出的技术可以找到形状与某个给定轨迹相似的轨迹。我们将基于形状的相似度查询轨迹定义为时间序列数据相似度查询的扩展,然后结合运动物体的速度和物体的形状,提出了一种新的基于相似度的聚类技术。此外,我们通过移动人力车数据的性能研究证明了我们所提出的聚类方法的有效性。
{"title":"Clustering Multidimensional Trajectories based on Shape and Velocity","authors":"Y. Yanagisawa, T. Satoh","doi":"10.1109/ICDEW.2006.39","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.39","url":null,"abstract":"Recently, the analysis of moving objects has become one of the most important technologies to be used in various applications such as GIS, navigation systems, and locationbased information systems, Existing geographic analysis approaches are based on points where each object is located at a certain time. These techniques can extract interesting motion patterns from each moving object, but they can not extract relative motion patterns from many moving objects. Therefore, to retrieve moving objects with a similar trajectory shape to another given moving object, we propose queries based on the similarity between the shapes of moving object trajectories. Our proposed technique can find trajectories whose shape is similar to a certain given trajectory. We define the shape-based similarity query trajectories as an extension of similarity queries for time series data, and then we propose a new clustering technique based on similarity by combining both velocities of moving objects and shapes of objects. Moreover, we show the effectiveness of our proposed clustering method through a performance study using moving rickshaw data.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125681016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Estimating Top N Hosts in Cardinality Using Small Memory Resources 利用小内存资源估计基数排名前N的主机
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.56
K. Ishibashi, Tatsuya Mori, R. Kawahara, Yutaka Hirokawa, A. Kobayashi, K. Yamamoto, H. Sakamoto
We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.
我们提出了一种方法来查找具有N个最高基数的N个主机,其中基数是不同项目的数量,例如流,端口或对等主机的数量。该方法还估计它们的基数。虽然查找前N个频繁项的现有算法可以直接应用于查找通过数据包数据流发送N个最大数据包的N个主机,但查找具有N个最高基数的主机需要为每个主机提供以前见过的项的表,以检查到达数据包的项是否为新项,这需要大量内存。即使我们使用现有的基数估计方法,我们仍然需要有关于每个主机的基数信息。本文利用了基数估计的性质,利用每个数据集的基数信息来估计多个数据集相交的基数。利用该属性,我们提出了一种算法,该算法不需要为每个主机维护表,而只需要为主机的分区地址维护表,并将主机的基数估计为分区地址基数的交集。我们还提出了一种在基数中找到前N个主机的方法,该方法将被监控以检测网络中的异常行为。我们通过实际的骨干流量数据来评估我们的算法。虽然对于较小的基数,我们的方案的估计精度会降低,但对于前100个主机,我们的算法使用4,096个表的准确性几乎与使用每个主机的表相同。
{"title":"Estimating Top N Hosts in Cardinality Using Small Memory Resources","authors":"K. Ishibashi, Tatsuya Mori, R. Kawahara, Yutaka Hirokawa, A. Kobayashi, K. Yamamoto, H. Sakamoto","doi":"10.1109/ICDEW.2006.56","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.56","url":null,"abstract":"We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128138447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Mining Executive Compensation Data from SEC Filings 从SEC文件中挖掘高管薪酬数据
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.89
Chengmin Ding, Ping Chen
In recent years, corporate governance has become an important concern in investment decision-making. As one of the most important factors in evaluating corporate governance, executive compensation study has drawn a lot of attention. Most companies with excessive executive pay are linked with scandals or corporate failures. This paper presents a text mining system ECRS (Executive Compensation Retrieval System) to automatically extract executive compensation data from the SEC (http://www.sec.gov) proxy filing. An analysis based on the extracted data is provided and some samples on using the raw data to derive useful information for the financial analysts are also presented
近年来,公司治理已成为投资决策的重要关注点。高管薪酬作为评价公司治理的重要因素之一,受到了广泛的关注。大多数高管薪酬过高的公司都与丑闻或企业倒闭有关。本文提出了一个文本挖掘系统ECRS(高管薪酬检索系统),用于自动从SEC (http://www.sec.gov)代理文件中提取高管薪酬数据。基于提取的数据进行了分析,并给出了一些使用原始数据为金融分析师提供有用信息的示例
{"title":"Mining Executive Compensation Data from SEC Filings","authors":"Chengmin Ding, Ping Chen","doi":"10.1109/ICDEW.2006.89","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.89","url":null,"abstract":"In recent years, corporate governance has become an important concern in investment decision-making. As one of the most important factors in evaluating corporate governance, executive compensation study has drawn a lot of attention. Most companies with excessive executive pay are linked with scandals or corporate failures. This paper presents a text mining system ECRS (Executive Compensation Retrieval System) to automatically extract executive compensation data from the SEC (http://www.sec.gov) proxy filing. An analysis based on the extracted data is provided and some samples on using the raw data to derive useful information for the financial analysts are also presented","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130938850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Pragmatics and Open Problems for Inter-schema Constraint Theory 图式间约束理论的语用与开放性问题
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.111
A. Rosenthal, Leonard J. Seligman
We consider pragmatic issues in applying constraint-based theories (such as that developed for data exchange) to a variety of problems. We identify disconnects between theoreticians and tool developers, and propose principles for creating problem units that are appropriate for tools. Our Downstream Principle then explains why automated schema mapping is a prerequisite to transitioning schemamatching prototypes. Next, we compare concerns, strengths, and weaknesses of database and AI approaches to data exchange. Finally, we discuss how constrained update by business processes is a central difficulty in maintaining n-tier applications, and compare the challenges with data exchange and conventional view update
我们考虑将基于约束的理论(例如为数据交换而开发的理论)应用于各种问题的实用问题。我们确定了理论家和工具开发人员之间的脱节,并提出了创建适合工具的问题单元的原则。然后,我们的下游原则解释了为什么自动模式映射是转换模式匹配原型的先决条件。接下来,我们比较了数据库和人工智能方法在数据交换方面的关注点、优缺点。最后,我们讨论了业务流程的受限更新如何成为维护n层应用程序的主要困难,并将其与数据交换和传统视图更新的挑战进行了比较
{"title":"Pragmatics and Open Problems for Inter-schema Constraint Theory","authors":"A. Rosenthal, Leonard J. Seligman","doi":"10.1109/ICDEW.2006.111","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.111","url":null,"abstract":"We consider pragmatic issues in applying constraint-based theories (such as that developed for data exchange) to a variety of problems. We identify disconnects between theoreticians and tool developers, and propose principles for creating problem units that are appropriate for tools. Our Downstream Principle then explains why automated schema mapping is a prerequisite to transitioning schemamatching prototypes. Next, we compare concerns, strengths, and weaknesses of database and AI approaches to data exchange. Finally, we discuss how constrained update by business processes is a central difficulty in maintaining n-tier applications, and compare the challenges with data exchange and conventional view update","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131915699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Self-Organizing Search Engine for RSS Syndicated Web Contents RSS联合Web内容的自组织搜索引擎
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.19
Ying Zhou, Xin Chen, Chen Wang
The exponentially growing information published on the Web relies largely on a few major search engines like Google to be brought to the public nowadays. This raises issues such as: 1. how many percents of coverage do these search engines provide for the whole shared contents over the Internet? 2. how easy is it to find less popular contents from the Web through the page ranking system of these search engines? In fact, the increasing dynamics of the information distributed on the Internet challenge the flexibility of these centralized search engines. With the amount of structured and semi-structured data increase on the Internet, self-organizing search engines that are capable of providing sufficient coverage for data that follow certain structures get more and more attractive. In this paper, we propose a self-organizing search engine soSpace for RSS syndicated web data. soSpace is built on structured peer-to-peer technology. It enables indexing and searching of frequently updated web information described by RSS feed. Our experiment results show that it has good scalability as the contents increase. The recall and precision rate of the result are satisfactory as well.
如今,网络上发布的指数级增长的信息在很大程度上依赖于像谷歌这样的几个主要搜索引擎将其带到公众面前。这引发了以下问题:1。这些搜索引擎为整个互联网上的共享内容提供了百分之多少的覆盖率?2. 通过这些搜索引擎的页面排名系统从网络上找到不太受欢迎的内容有多容易?事实上,互联网上日益增长的动态信息对这些集中式搜索引擎的灵活性提出了挑战。随着Internet上结构化和半结构化数据量的增加,能够对遵循一定结构的数据提供充分覆盖的自组织搜索引擎越来越受欢迎。在本文中,我们提出了一个自组织的搜索引擎soSpace,用于RSS聚合web数据。soSpace建立在结构化的点对点技术之上。它可以索引和搜索RSS提要描述的经常更新的网络信息。实验结果表明,随着内容的增加,该系统具有良好的可扩展性。结果的查全率和查准率令人满意。
{"title":"A Self-Organizing Search Engine for RSS Syndicated Web Contents","authors":"Ying Zhou, Xin Chen, Chen Wang","doi":"10.1109/ICDEW.2006.19","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.19","url":null,"abstract":"The exponentially growing information published on the Web relies largely on a few major search engines like Google to be brought to the public nowadays. This raises issues such as: 1. how many percents of coverage do these search engines provide for the whole shared contents over the Internet? 2. how easy is it to find less popular contents from the Web through the page ranking system of these search engines? In fact, the increasing dynamics of the information distributed on the Internet challenge the flexibility of these centralized search engines. With the amount of structured and semi-structured data increase on the Internet, self-organizing search engines that are capable of providing sufficient coverage for data that follow certain structures get more and more attractive. In this paper, we propose a self-organizing search engine soSpace for RSS syndicated web data. soSpace is built on structured peer-to-peer technology. It enables indexing and searching of frequently updated web information described by RSS feed. Our experiment results show that it has good scalability as the contents increase. The recall and precision rate of the result are satisfactory as well.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131973047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Semantic Model to Integrate Biological Resources 整合生物资源的语义模型
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.133
Z. Lacroix, L. Raschid, Maria-Esther Vidal
We present a framework that uses semantic modeling to represent biological data sources, the multiple links that capture relationships among them, as well as the various applications that transform or analyze biological data. We introduce a data model that encompass three layers: the ontological layer composed of an ontology to represent the scientific concepts and their relationships, the physical layer of the physical resources made available to the scientists, and the data layer composed of the entries accessible through the different resources.
我们提出了一个框架,该框架使用语义建模来表示生物数据源,捕获它们之间关系的多个链接,以及转换或分析生物数据的各种应用程序。我们引入了一个包含三层的数据模型:本体层由表示科学概念及其关系的本体组成,物理层由可供科学家使用的物理资源组成,数据层由可通过不同资源访问的条目组成。
{"title":"Semantic Model to Integrate Biological Resources","authors":"Z. Lacroix, L. Raschid, Maria-Esther Vidal","doi":"10.1109/ICDEW.2006.133","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.133","url":null,"abstract":"We present a framework that uses semantic modeling to represent biological data sources, the multiple links that capture relationships among them, as well as the various applications that transform or analyze biological data. We introduce a data model that encompass three layers: the ontological layer composed of an ontology to represent the scientific concepts and their relationships, the physical layer of the physical resources made available to the scientists, and the data layer composed of the entries accessible through the different resources.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132074626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Quality Estimation of Local Contents Based on PageRank Values of Web Pages 基于网页PageRank值的本地内容质量估计
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.121
Y. Kabutoya, T. Yumoto, S. Oyama, Keishi Tajima, Katsumi Tanaka
Recently, it is getting more frequent to search not Web contents but local contents, e.g., by Google Desktop Search. Google succeeded in the Web search because of its PageRank algorithm for the ranking of the search results. PageRank estimates the quality of Web pages based on their popularity, which in turn is estimated by the number and the quality of pages referring to them through hyperlinks. This algorithm, however, is not applicable when we search local contents without link structure, such as text data. In this research, we propose a method to estimate the quality of local contents without link structure by using the PageRank values of Web contents similar to them. Based on this estimation, we can rank the desktop search results. Furthermore, this method enables us to search contents across different resources such as Web contents and local contents. In this paper, we applied this method to Web contents, calculated the scores that estimate their quality, and we compare them with their page quality scores by PageRank.
最近,人们越来越频繁地搜索本地内容而不是网络内容,例如通过Google桌面搜索。谷歌在网络搜索方面取得了成功,因为它采用了PageRank算法对搜索结果进行排名。PageRank根据网页的受欢迎程度来估计网页的质量,而受欢迎程度又由通过超链接指向这些网页的页面的数量和质量来估计。然而,当我们搜索没有链接结构的本地内容(如文本数据)时,该算法不适用。在本研究中,我们提出了一种方法,通过使用网页内容的PageRank值来估计没有链接结构的本地内容的质量。基于这个估计,我们可以对桌面搜索结果进行排序。此外,该方法使我们能够跨不同资源(如Web内容和本地内容)搜索内容。在本文中,我们将该方法应用于Web内容,计算了估计其质量的分数,并将其与PageRank的页面质量分数进行了比较。
{"title":"Quality Estimation of Local Contents Based on PageRank Values of Web Pages","authors":"Y. Kabutoya, T. Yumoto, S. Oyama, Keishi Tajima, Katsumi Tanaka","doi":"10.1109/ICDEW.2006.121","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.121","url":null,"abstract":"Recently, it is getting more frequent to search not Web contents but local contents, e.g., by Google Desktop Search. Google succeeded in the Web search because of its PageRank algorithm for the ranking of the search results. PageRank estimates the quality of Web pages based on their popularity, which in turn is estimated by the number and the quality of pages referring to them through hyperlinks. This algorithm, however, is not applicable when we search local contents without link structure, such as text data. In this research, we propose a method to estimate the quality of local contents without link structure by using the PageRank values of Web contents similar to them. Based on this estimation, we can rank the desktop search results. Furthermore, this method enables us to search contents across different resources such as Web contents and local contents. In this paper, we applied this method to Web contents, calculated the scores that estimate their quality, and we compare them with their page quality scores by PageRank.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134151273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Toward a Query Language for Network Attack Data 网络攻击数据查询语言研究
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.149
Bee-Chung Chen, V. Yegneswaran, P. Barford, R. Ramakrishnan
The growing sophistication and diversity of malicious activity in the Internet presents a serious challenge for network security analysts. In this paper, we describe our efforts to develop a database and query language for network attack data from firewalls, intrusion detection systems and honeynets. Our first step toward this objective is to develop a prototype database and query interface to identify coordinated scanning activity in network attack data. We have created a set of aggregate views and templatized SQL queries that consider timing, persistence, targeted services, spatial dispersion and temporal dispersion, thereby enabling us to evaluate coordinated scanning along these dimensions. We demonstrate the utility of the interface by conducting a case study on a set of firewall and intrusion detection system logs from Dshield.org. We show that the interface is able to identify general characteristics of coordinated activity as well as instances of unusual activity that would otherwise be difficult to mine from the data. These results highlight the potential for developing a more generalized query language for a broad class of network intrusion data. The case study also exposes some of the challenges we face in extending our system to more generalized queries over potentially vast quantities of data.
互联网上恶意活动的日益复杂和多样化给网络安全分析师提出了严峻的挑战。在本文中,我们描述了我们为防火墙、入侵检测系统和蜜网的网络攻击数据开发数据库和查询语言的努力。我们实现这一目标的第一步是开发一个原型数据库和查询接口,以识别网络攻击数据中的协调扫描活动。我们已经创建了一组聚合视图和模板化的SQL查询,它们考虑了时间、持久性、目标服务、空间分散和时间分散,从而使我们能够沿着这些维度评估协调扫描。我们通过对Dshield.org上的一组防火墙和入侵检测系统日志进行案例研究来演示该接口的实用性。我们表明,该接口能够识别协调活动的一般特征,以及异常活动的实例,否则将难以从数据中挖掘。这些结果突出了为广泛的网络入侵数据开发更通用查询语言的潜力。案例研究还揭示了我们在将系统扩展到对潜在大量数据进行更一般化查询时所面临的一些挑战。
{"title":"Toward a Query Language for Network Attack Data","authors":"Bee-Chung Chen, V. Yegneswaran, P. Barford, R. Ramakrishnan","doi":"10.1109/ICDEW.2006.149","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.149","url":null,"abstract":"The growing sophistication and diversity of malicious activity in the Internet presents a serious challenge for network security analysts. In this paper, we describe our efforts to develop a database and query language for network attack data from firewalls, intrusion detection systems and honeynets. Our first step toward this objective is to develop a prototype database and query interface to identify coordinated scanning activity in network attack data. We have created a set of aggregate views and templatized SQL queries that consider timing, persistence, targeted services, spatial dispersion and temporal dispersion, thereby enabling us to evaluate coordinated scanning along these dimensions. We demonstrate the utility of the interface by conducting a case study on a set of firewall and intrusion detection system logs from Dshield.org. We show that the interface is able to identify general characteristics of coordinated activity as well as instances of unusual activity that would otherwise be difficult to mine from the data. These results highlight the potential for developing a more generalized query language for a broad class of network intrusion data. The case study also exposes some of the challenges we face in extending our system to more generalized queries over potentially vast quantities of data.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134207631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Towards a Quality Model for Effective Data Selection in Collaboratories 面向协作实验室有效数据选择的质量模型
Pub Date : 2006-04-03 DOI: 10.1109/ICDEW.2006.150
Yogesh L. Simmhan, Beth Plale, Dennis Gannon
Data-driven scientific applications utilize workflow frameworks to execute complex dataflows, resulting in derived data products of unknown quality. We discuss our on-going research on a quality model that provides users with an integrated estimate of the data quality that is tuned to their application needs and is available as a numerical quality score that enables uniform comparison of datasets, providing a way for the community to trust derived data.
数据驱动的科学应用程序利用工作流框架来执行复杂的数据流,从而产生质量未知的派生数据产品。我们讨论了我们正在进行的关于质量模型的研究,该模型为用户提供了对数据质量的综合估计,该模型可以根据用户的应用需求进行调整,并且可以作为数字质量分数,从而实现数据集的统一比较,为社区提供了一种信任派生数据的方法。
{"title":"Towards a Quality Model for Effective Data Selection in Collaboratories","authors":"Yogesh L. Simmhan, Beth Plale, Dennis Gannon","doi":"10.1109/ICDEW.2006.150","DOIUrl":"https://doi.org/10.1109/ICDEW.2006.150","url":null,"abstract":"Data-driven scientific applications utilize workflow frameworks to execute complex dataflows, resulting in derived data products of unknown quality. We discuss our on-going research on a quality model that provides users with an integrated estimate of the data quality that is tuned to their application needs and is available as a numerical quality score that enables uniform comparison of datasets, providing a way for the community to trust derived data.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133253057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
期刊
22nd International Conference on Data Engineering Workshops (ICDEW'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1