J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz
{"title":"药物基因组学时间序列的半监督阈值查询","authors":"J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz","doi":"10.1142/9781860947292_0034","DOIUrl":null,"url":null,"abstract":"The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are ignored. This advantage is very beneficial, in particular, if we want to compare time","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"81 1","pages":"307-316"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences\",\"authors\":\"J. Aßfalg, H. Kriegel, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz\",\"doi\":\"10.1142/9781860947292_0034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are ignored. This advantage is very beneficial, in particular, if we want to compare time\",\"PeriodicalId\":74513,\"journal\":{\"name\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"volume\":\"81 1\",\"pages\":\"307-316\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... Asia-Pacific bioinformatics conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9781860947292_0034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
时间序列数据的分析对于药物基因组学来说是至关重要的,因为实验评估通常是基于对生物体的时间依赖性反应或行为的观察。因此,时间序列数据库中的数据挖掘是了解药物对个体影响的重要工具。然而,时间序列的复杂性对有效和高效的数据挖掘提出了巨大的挑战。本文主要研究了不同时间序列间时间相关性的检测问题,引入了新的阈值查询分析概念及其半监督扩展,该概念支持通过训练数据集进行参数设置。基本上,阈值查询会报告在特定时间范围内超过用户定义查询阈值的时间序列。对于半监督阈值查询,相应的阈值分别根据数据集、训练数据集的特征自动调整。为了有效地支持阈值查询,我们利用查询时只需要部分时间序列信息的特点,提出了一种新的高效访问方法。在广泛的实验评估中,我们证明了我们的解决方案的性能,并表明半监督阈值查询应用于基因表达数据是非常值得的。时间序列数据中的数据挖掘是研究药物及其对生命系统影响的关键步骤,包括化学定义的治疗方法和毒性药物的发现、设计、使用、作用模式和代谢。特别是时间序列数据的分析对药物基因组学具有重要的实际意义。经典的时间序列分析是基于预测或识别模式的技术(例如趋势分析或季节性分析)。时间序列之间的相似性,例如时间序列的相似运动,在分析中起着关键作用。在本文中,我们引入了一种新颖但非常重要的相似度查询类型,我们称之为阈值查询。给定一个时间序列数据库DB,查询时间序列Q,并查询阈值τ,阈值查询TSQ DB (Q,τ)返回这些时间序列X∈DB的时间间隔序列最相似的时间序列值高于τ,n——换句话说,我们假设每个时间序列X∈DB∪{Q}转换成一个不相交的时间间隔序列覆盖只有X(严格)的阈值τ。然后,阈值查询返回给定查询对象Q,该对象X∈DB具有最相似的时间间隔序列。让我们注意到,时间序列的确切值没有被考虑,相反,我们只对时间序列是否高于或低于给定阈值τ感兴趣。换句话说,阈值查询的概念使我们能够只关注由增加的时间序列振幅表示的某些事件的持续时间,而忽略相应振幅的程度。这个优势是非常有益的,特别是如果我们想比较时间
Semi-Supervised Threshold Queries on Pharmacogenomics Time Sequences
The analysis of time series data is of capital importance for pharmacogenomics since the experimental evaluations are usually based on observations of time dependent reactions or behaviors of organisms. Thus, data mining in time series databases is an important instrument towards understanding the effects of drugs on individuals. However, the complex nature of time series poses a big challenge for effective and efficient data mining. In this paper, we focus on the detection of temporal dependencies between different time series: we introduce the novel analysis concept of threshold queries and its semi-supervised extension which supports the parameter setting by applying training datasets. Basically, threshold queries report those time series exceeding an user-defined query threshold at certain time frames. For semi-supervised threshold queries the corresponding threshold is automatically adjusted to the characteristics of the data set, the training dataset, respectively. In order to support threshold queries efficiently, we present a new efficient access method which uses the fact that only partial information of the time series is required at query time. In an extensive experimental evaluation we demonstrate the performance of our solution and show that semi-supervised threshold queries applied to gene expression data are very worthwhile. Data mining in time series data is a key step within the study of drugs and their impact on living systems, including the discovery, design, usage, modes of action, and metabolism of chemically defined therapeutics and toxic agents. In particular, the analysis of time series data is of great practical importance for pharmacogenomics. Classical time series analysis is based on techniques for forecasting or for identifying patterns (e.g. trend analysis or seasonality). The similarity between time series, e.g. similar movements of time series, plays a key role for the analysis. In this paper, we introduce a novel but very important similarity query type which we call threshold query. Given a time series database DB, a query time series Q, and a query threshold τ , a threshold query TSQ DB(Q, τ ) returns those time series X ∈ DB having the most similar sequence of time intervals in which the time series values are above τ .I n other words, we assume that each time series X ∈ DB ∪{ Q} is transformed into a sequence of disjoint time intervals covering only those values of X that are (strictly) above the threshold τ . Then, a threshold query returns for a given query object Q that object X ∈ DB having the most similar sequence of time intervals. Let us note that the exact values of the time series are not considered, rather we are only interested in whether the time series is above or below a given threshold τ . In other words, the concept of threshold queries enables us to focus only on the duration of certain events indicated by increased time series amplitudes, while the degree of the corresponding amplitudes are ignored. This advantage is very beneficial, in particular, if we want to compare time