2011 First International Conference on Data Compression, Communications and Processing最新文献

英文中文

An Online Algorithm for Lightweight Grammar-Based Compression 一种基于语法的在线轻量级压缩算法

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.40

Shirou Maruyama, M. Takeda, Masaya Nakahara, H. Sakamoto

Grammar-based compression is a well-studied technique for constructing a small context-free grammar (CFG) uniquely deriving a given text. In this paper, we present an online algorithm for lightweight grammar-based compression. Our algorithm is based on the LCA algorithm [Sakamoto et al. 2004]which guarantees nearly optimum compression ratio and space. LCA, however, is an offline algorithm and requires external space to save space consumption. Therefore, we present its online version which inherits most characteristics of the original LCA. Our algorithm guarantees $O(log^2 n)$-approximation ratio for an optimum grammar size, and all work is carried out on a main memory space which is bounded by the output size. In addition, we propose more practical encoding based on parentheses representation of a binary tree. Experimental results for repetitive texts demonstrate that our algorithm achieves effective compression compared to other practical compressors and the space consumption of our algorithm is smaller than the input text size.

基于语法的压缩是一种经过充分研究的技术，用于构造一个小型的上下文无关语法(CFG)，惟一地派生给定的文本。本文提出了一种基于语法的在线轻量级压缩算法。我们的算法基于LCA算法[Sakamoto et al. 2004]，它保证了近乎最佳的压缩比和空间。而LCA算法是一种离线算法，需要外部空间来节省空间消耗。因此，我们提出了它的在线版本，它继承了原始LCA的大部分特征。我们的算法保证$O(log^2 n)$-近似比率为最佳语法大小，并且所有工作都在由输出大小限定的主内存空间上进行。此外，我们提出了基于二叉树的括号表示的更实用的编码方法。实验结果表明，与其他实用压缩器相比，我们的算法在重复文本的压缩中取得了有效的压缩效果，并且我们的算法占用的空间小于输入文本的大小。

引用次数: 44

A Generic Intrusion Detection and Diagnoser System Based on Complex Event Processing 基于复杂事件处理的通用入侵检测诊断系统

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.43

M. Ficco, L. Romano

This work presents a generic Intrusion Detection and Diagnosis System, which implements a comprehensive alert correlation workflow for detection and diagnosis of complex intrusion scenarios in Large scale Complex Critical Infrastructures. The on-line detection and diagnosis process is based on an hybrid and hierarchical approach, which allows to detect intrusion scenarios by collecting diverse information at several architectural levels, using distributed security probes, as well as perform complex event correlation based on a Complex Event Processing Engine. The escalation process from intrusion symptoms to the identified target and cause of the intrusion is driven by a knowledge-base represented by an ontology. A prototype implementation of the proposed Intrusion Detection and Diagnosis framework is also presented.

本文提出了一种通用的入侵检测与诊断系统，该系统实现了一种全面的警报关联工作流，用于大型复杂关键基础设施中复杂入侵场景的检测与诊断。在线检测和诊断过程基于混合和分层方法，该方法允许通过在多个体系结构级别收集不同信息来检测入侵场景，使用分布式安全探测器，以及基于复杂事件处理引擎执行复杂事件关联。从入侵症状到识别的目标和入侵原因的升级过程由本体表示的知识库驱动。提出了入侵检测与诊断框架的原型实现。

引用次数: 50

Intrusion Tolerant Approach for Denial of Service Attacks to Web Services Web服务拒绝服务攻击的入侵容忍方法

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.44

M. Ficco, M. Rak

Intrusion Detection Systems are the major technology used for protecting information systems. However, they do not directly detect intrusion, but they only monitor the attack symptoms. Therefore, no assumption can be made on the outcome of the attack, no assurance can be assumed once the system is compromised. The intrusion tolerance techniques focus on providing minimal level of services, even when the system has been partially compromised. This paper presents an intrusion tolerant approach for Denial of Service attacks to Web Services. It focuses on the detection of attack symptoms as well as the diagnosis of intrusion effects in order to perform a proper reaction only if the attack succeeds. In particular, this work focuses on a specific Denial of Service attack, called Deeply-Nested XML. Preliminary experimental results show that the proposed approach results in a better performance of the Intrusion Detection Systems, in terms of increasing diagnosis capacity as well as reducing the service unavailability during an intrusion.

入侵检测系统是保护信息系统的主要技术。但是，它们不直接检测入侵，而只监视攻击症状。因此，不能对攻击的结果做出任何假设，也不能保证一旦系统被破坏。入侵容忍技术的重点是提供最低级别的服务，即使系统已被部分破坏。本文提出了一种针对Web服务拒绝服务攻击的入侵容忍方法。它侧重于检测攻击症状以及诊断入侵影响，以便仅在攻击成功时才执行适当的反应。特别地，这项工作侧重于特定的拒绝服务攻击，称为深度嵌套XML。初步实验结果表明，该方法提高了入侵检测系统的诊断能力，减少了入侵过程中服务的不可用性。

引用次数: 31

Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture CPU-GPGPU异构架构下基于内容的多线程文件分块系统

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.20

Zhi Tang, Y. Won

the fast development of Graphics Processing Unit (GPU) leads to the popularity of General-purpose usage of GPU (GPGPU). So far, most modern computers are CPU-GPGPU heterogeneous architecture and CPU is used as host processor. In this work, we promote a multithread file chunking prototype system, which is able to exploit the hardware organization of the CPU-GPGPU heterogeneous computer and determine which device should be used to chunk the file to accelerate the content based file chunking operation of deduplication. We built rules for the system to choose which device should be used to chunk file and also found the optimal choice of other related parameters of both CPU and GPGPU subsystem like segment size and block dimension. This prototype was implemented and tested. The result of using GTX460(336 cores) and Intel i5 (four cores) shows that this system can increase the chunking speed 63% compared to using GPGPU alone and 80% compared to using CPU alone.

图形处理器(Graphics Processing Unit, GPU)的快速发展使得通用图形处理器(GPGPU)的普及。到目前为止，大多数现代计算机都是CPU- gpgpu异构架构，使用CPU作为主处理器。在本工作中，我们提出了一个多线程文件分块原型系统，该系统能够利用CPU-GPGPU异构计算机的硬件组织，确定应该使用哪个设备对文件进行分块，以加速重复数据删除中基于内容的文件分块操作。我们建立了系统选择应该使用哪个设备来块文件的规则，并找到了CPU和GPGPU子系统的其他相关参数(如段大小和块尺寸)的最佳选择。该原型已实现并进行了测试。使用GTX460(336核)和Intel i5(4核)的结果表明，与单独使用GPGPU相比，该系统的分块速度提高了63%，与单独使用CPU相比提高了80%。

引用次数: 12

Natural Language Compression per Blocks 每个块的自然语言压缩

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.25

P. Procházka, J. Holub

We present a new natural language compression method: Semi-adaptive Two Byte Dense Code (STBDC). STBDC performs compression per blocks. It means that the input is divided into the several blocks and each of the blocks is compressed separately according to its own statistical model. To avoid the redundancy the final vocabulary file is composed as the sequence of the changes in the model of the two consecutive blocks. STBDC belongs to the family of Dense codes and keeps all their attractive properties including very high compression and decompression speed and acceptable compression ratio around 32 % on natural language text. Moreover STBDC provides other properties applicable in digital libraries and other textual databases. The compression method allows direct searching on the compressed text, whereas the vocabulary can be used as a block index. STBDC is very easy on limited bandwidth in the client/server architecture. It can send namely single compressed blocks only with corresponding part of the vocabulary. Further STBDC enables various approaches of updating and extending of the compressed text.

提出了一种新的自然语言压缩方法:半自适应双字节密集码(STBDC)。STBDC对每个块执行压缩。这意味着输入被分成几个块，每个块根据自己的统计模型被单独压缩。为了避免冗余，最终词汇表文件由两个连续块的模型变化序列组成。STBDC属于密集代码家族，并保持了所有吸引人的特性，包括非常高的压缩和解压缩速度以及在自然语言文本上可接受的约32%的压缩比。此外，STBDC还提供了适用于数字图书馆和其他文本数据库的其他属性。压缩方法允许对压缩文本进行直接搜索，而词汇表可以用作块索引。在客户端/服务器架构中，STBDC在有限的带宽下非常容易实现。它可以发送单个压缩块，只包含词汇表的相应部分。此外，STBDC支持更新和扩展压缩文本的各种方法。

引用次数: 3

A Review of DNA Microarray Image Compression DNA微阵列图像压缩研究进展

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.21

Miguel Hernández-Cabronero, Ian Blanes, J. Serra-Sagristà, M. Marcellin

We review the state of the art in DNA micro array image compression. First, we describe the most relevant approaches published in the literature and classify them according to the stage of the typical image compression process where each approach makes its contribution. We then summarize the compression results reported for these specific-specific image compression schemes. In a set of experiments conducted for this paper, we obtain results for several popular image coding techniques, including the most recent coding standards. Prediction-based schemes CALIC and JPEG-LS, and JPEG2000 using zero wavelet decomposition levels are the best performing standard compressors, but are all outperformed by the best micro array-specific technique, Battiato's CNN-based scheme.

我们回顾了DNA微阵列图像压缩技术的最新进展。首先，我们描述了文献中发表的最相关的方法，并根据每种方法做出贡献的典型图像压缩过程的阶段对它们进行了分类。然后，我们总结了这些特定图像压缩方案的压缩结果。在本文进行的一组实验中，我们获得了几种流行的图像编码技术的结果，包括最新的编码标准。基于预测的方案CALIC和JPEG-LS，以及使用零小波分解级别的JPEG2000是性能最好的标准压缩器，但都优于最佳的微阵列特定技术，Battiato基于cnn的方案。

引用次数: 5

Hierarchical Type Classes and Their Entropy Functions 分层类型类及其熵函数

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.36

J. Kieffer

For each $j geq 1$, if $T_j$ is the finite rooted binary tree with $2^j$ leaves, the hierarchical type class of binary string $x$ of length $2^j$ is obtained by placing the entries of $x$ as label son the leaves of $T_j$ and then forming all permutations of $x$according to the permutations of the leaf labels under all isomorphisms of tree $T_j$ into itself. The set of binary strings of length $2^j$ is partitioned into hierarchical type classes, and in each such class, all of the strings have the same type $(n_0^j, n_1^j)$, where $n_0^j, n_1^j$ are respectively the numbers of zeroes and ones in the strings. Let $p(n_0^j, n_1^j)$ be the probability vector $(n_0^j/2^j, n_1^j/2^j)$belonging to the set ${cal P}_2$ of all two-dimensional probability vectors. For each $j geq 1$, and each of the $2^j+1$ possible types $(n_0^j, n_1^j)$, a hierarchical type class ${cal S}(n_0^j, n_1^j)$is specified. Conditions are investigated under which there will exist a function $h:{cal P}_2to [0, infty)$ such that for each $pin {cal P}_2$, if ${(n_0^j, n_1^j):jgeq 1}$ is any sequence of types for which $p(n_0^j, n_1^j) to p$, then the sequence ${2^{-j}log_2({rm card}({cal S}(n_0^j, n_1^j))):j geq 1}$converges to $h(p)$. Such functions $h$, called hierarchical entropy functions, play the same role in hierarchical type class coding theory that the Shannon entropy function on ${cal P}_2$ does in traditional type class coding theory, except that there are infinitely many hierarchical entropy functions but only one Shannon entropy function. One of the hierarchical entropy functions $h$ that is studied is a self-affine function for which a closed-form expression is obtained making use of an iterated function system whose attractor is the graph of $h$.

对于每一个$j geq 1$，如果$T_j$是一棵叶子为$2^j$的有限根二叉树，将$x$的条目作为标签放置在$T_j$的叶子上，然后根据$T_j$树的所有同构下叶子标签的排列形成$x$的所有排列，得到长度为$2^j$的二叉字符串$x$的层次类型类。长度为$2^j$的二进制字符串集被划分为分层类型类，在每个这样的类中，所有字符串都具有相同的类型$(n_0^j, n_1^j)$，其中$n_0^j, n_1^j$分别是字符串中的0和1的数量。设$p(n_0^j, n_1^j)$为属于所有二维概率向量集合${cal P}_2$的概率向量$(n_0^j/2^j, n_1^j/2^j)$。对于每个$j geq 1$和每个$2^j+1$可能的类型$(n_0^j, n_1^j)$，都指定了一个层次结构类型类${cal S}(n_0^j, n_1^j)$。研究了存在一个函数$h:{cal P}_2to [0, infty)$的条件，使得对于每个$pin {cal P}_2$，如果${(n_0^j, n_1^j):jgeq 1}$是任何类型的序列，对于$p(n_0^j, n_1^j) to p$，则序列${2^{-j}log_2({rm card}({cal S}(n_0^j, n_1^j))):j geq 1}$收敛到$h(p)$。这种称为层次熵函数$h$的函数在层次类型类编码理论中的作用与${cal P}_2$上的香农熵函数在传统类型类编码理论中的作用相同，只不过层次熵函数有无限多个，而香农熵函数只有一个。本文研究的层次熵函数$h$是一个自仿射函数，利用一个吸引子为$h$图的迭代函数系统，得到了它的封闭表达式。

{"title":"Hierarchical Type Classes and Their Entropy Functions","authors":"J. Kieffer","doi":"10.1109/CCP.2011.36","DOIUrl":"https://doi.org/10.1109/CCP.2011.36","url":null,"abstract":"For each $j geq 1$, if $T_j$ is the finite rooted binary tree with $2^j$ leaves, the hierarchical type class of binary string $x$ of length $2^j$ is obtained by placing the entries of $x$ as label son the leaves of $T_j$ and then forming all permutations of $x$according to the permutations of the leaf labels under all isomorphisms of tree $T_j$ into itself. The set of binary strings of length $2^j$ is partitioned into hierarchical type classes, and in each such class, all of the strings have the same type $(n_0^j, n_1^j)$, where $n_0^j, n_1^j$ are respectively the numbers of zeroes and ones in the strings. Let $p(n_0^j, n_1^j)$ be the probability vector $(n_0^j/2^j, n_1^j/2^j)$belonging to the set ${cal P}_2$ of all two-dimensional probability vectors. For each $j geq 1$, and each of the $2^j+1$ possible types $(n_0^j, n_1^j)$, a hierarchical type class ${cal S}(n_0^j, n_1^j)$is specified. Conditions are investigated under which there will exist a function $h:{cal P}_2to [0, infty)$ such that for each $pin {cal P}_2$, if ${(n_0^j, n_1^j):jgeq 1}$ is any sequence of types for which $p(n_0^j, n_1^j) to p$, then the sequence ${2^{-j}log_2({rm card}({cal S}(n_0^j, n_1^j))):j geq 1}$converges to $h(p)$. Such functions $h$, called hierarchical entropy functions, play the same role in hierarchical type class coding theory that the Shannon entropy function on ${cal P}_2$ does in traditional type class coding theory, except that there are infinitely many hierarchical entropy functions but only one Shannon entropy function. One of the hierarchical entropy functions $h$ that is studied is a self-affine function for which a closed-form expression is obtained making use of an iterated function system whose attractor is the graph of $h$.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Electrophysiological Data Processing Using a Dynamic Range Compressor Coupled to a Ten Bits A/D Convertion Port 使用一个动态范围压缩机耦合到一个10位a /D转换端口的电生理数据处理

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-06-21 DOI: 10.1109/CCP.2011.24

F. Babarada, C. Ravariu, A. Janel

The paper presents a hardware solution of the in vivo electrophysiological signals processing, using a continuous data acquisition on PC. The originality of the paper comes from some blocks proposal, which selective amplify the bio signals. One of the major problems in the electrophysiological signals monitoring is the impossibility to record the weak signals from deep organs that are covered by noise or by strong cardiac or muscular signals. An automatic gain control block is used, so that the high power skin signals are less amplified than the low components. The analog processing block is based on a dynamic range compressor, containing the automatic gain control block. The following block is a clipper since to capture all the transitions that escape from the dynamic range compressor. At clipper output a low-pass filter is connected since to abruptly cut the high frequencies, like 50Hz, ECG. The data vector recording is performing by strong internal resources micro controller including ten bits A/D conversion port.

本文提出了一种利用PC机连续数据采集实现体内电生理信号处理的硬件解决方案。本文的新颖之处在于提出了一种选择性放大生物信号的模块方案。电生理信号监测的主要问题之一是无法记录被噪声或强心脏或肌肉信号掩盖的深层器官发出的微弱信号。使用自动增益控制块，使高功率皮肤信号比低功率元件放大更小。模拟处理模块基于动态范围压缩器，包含自动增益控制模块。下面的代码块是一个剪接器，用于捕获从动态范围压缩器中逸出的所有转换。在削波器输出端连接一个低通滤波器，以突然切断高频，如50Hz, ECG。数据矢量记录由强大的内部资源微控制器完成，包括10位A/D转换端口。

引用次数: 3

Pattern Matching on Sparse Suffix Trees 稀疏后缀树的模式匹配

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-03-14 DOI: 10.1109/CCP.2011.45

R. Kolpakov, G. Kucherov, Tatiana Starikovskaya

We consider a compact text index based on evenly spaced sparse suffix trees of a text [9]. Such a tree is defined by partitioning the text into blocks of equal size and constructing the suffix tree only for those suffixes that start at block boundaries. We propose a new pattern matching algorithm on this structure. The algorithm is based on a notion of suffix links different from that of [9] and on the packing of several letters into one computer word.

我们考虑一个基于文本[9]的等间距稀疏后缀树的紧凑文本索引。这样的树是通过将文本划分为大小相等的块并仅为那些从块边界开始的后缀构建后缀树来定义的。在此基础上提出了一种新的模式匹配算法。该算法基于与[9]不同的后缀链接概念，并将几个字母打包成一个计算机单词。

引用次数: 14

An Axiomatic Approach to the Notion of Similarity of Individual Sequences and Their Classification 单个序列相似性概念的公理化方法及其分类

2011 First International Conference on Data Compression, Communications and Processing

Pub Date : 2011-02-27 DOI: 10.1109/CCP.2011.29

J. Ziv

An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified)probabilistic assumption. One such case is discussed in details.

提出了一种在许多情况下(例如系统发育分析)似乎是自然的序列相似性概念的公理化方法。尽管它不是假定序列是一个概率的过程的实现(例如variable-order马尔可夫过程),表明任何分类器,完全符合该相似性公理必须基于建模的训练数据中包含(长)个人训练序列通过后缀树树叶不超过O (N)(表,或者与O (N)条目)其中N是测试序列的长度。一些常见的分类算法可能会稍加修改，以符合所提出的公理条件和训练数据的最终组织，从而为其良好的经验性能提供正式的证明，而不依赖于任何先验的(有时是不合理的)概率假设。详细讨论了其中一个案例。

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 First International Conference on Data Compression, Communications and Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀