首页 > 最新文献

Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics最新文献

英文 中文
Learning Personalized Treatment Rules from Electronic Health Records Using Topic Modeling Feature Extraction. 利用主题建模特征提取从电子健康记录中学习个性化治疗规则。
Peng Wu, Tianchen Xu, Yuanjia Wang

To address substantial heterogeneity in patient response to treatment of chronic disorders and achieve the promise of precision medicine, individualized treatment rules (ITRs) are estimated to tailor treatments according to patient-specific characteristics. Randomized controlled trials (RCTs) provide gold standard data for learning ITRs not subject to confounding bias. However, RCTs are often conducted under stringent inclusion/exclusion criteria, and participants in RCTs may not reflect the general patient population. Thus, ITRs learned from RCTs lack generalizability to the broader real world patient population. Real world databases such as electronic health records (EHRs) provide new resources as complements to RCTs to facilitate evidence-based research for personalized medicine. However, to ensure the validity of ITRs learned from EHRs, a number of challenges including confounding bias and selection bias must be addressed. In this work, we propose a matching-based machine learning method to estimate optimal individualized treatment rules from EHRs using interpretable features extracted from EHR documentation of medications and ICD diagnoses codes. We use a latent Dirichlet allocation (LDA) model to extract latent topics and weights as features for learning ITRs. Our method achieves confounding reduction in observational studies through matching treated and untreated individuals and improves treatment optimization by augmenting feature space with clinically meaningful LDA-based features. We apply the method to EHR data collected at New York Presbyterian Hospital clinical data warehouse in studying optimal second-line treatment for type 2 diabetes (T2D) patients. We use cross validation to show that ITRs outperforms uniform treatment strategies (i.e., assigning same treatment to all individuals), and including topic modeling features leads to more reduction of post-treatment complications.

为了解决患者对慢性疾病治疗反应的严重异质性问题,并实现精准医疗的承诺,我们对个体化治疗规则(ITR)进行了估算,以便根据患者的特异性特征调整治疗方法。随机对照试验(RCT)为学习不受混杂偏倚影响的个体化治疗规则提供了黄金标准数据。然而,随机对照试验通常是在严格的纳入/排除标准下进行的,而且随机对照试验的参与者可能无法反映普通患者群体。因此,从 RCT 中学习到的 ITR 缺乏对更广泛的现实世界患者群体的普适性。电子健康记录(EHR)等现实世界的数据库提供了新的资源,可作为 RCT 的补充,促进个性化医学的循证研究。然而,为了确保从电子病历中获得的 ITR 的有效性,必须解决包括混杂偏倚和选择偏倚在内的一系列难题。在这项工作中,我们提出了一种基于匹配的机器学习方法,利用从电子病历的药物和 ICD 诊断代码文档中提取的可解释特征,从电子病历中估计最佳个体化治疗规则。我们使用潜在 Dirichlet 分配(LDA)模型提取潜在主题和权重作为学习 ITR 的特征。我们的方法通过匹配接受治疗和未接受治疗的个体来减少观察性研究中的混杂因素,并通过使用具有临床意义的基于 LDA 的特征来扩展特征空间来改进治疗优化。我们将该方法应用于纽约长老会医院临床数据仓库收集的电子病历数据,研究 2 型糖尿病(T2D)患者的最佳二线治疗。我们使用交叉验证表明,ITRs优于统一治疗策略(即对所有个体分配相同的治疗),而且包含主题建模特征可更多地减少治疗后并发症。
{"title":"Learning Personalized Treatment Rules from Electronic Health Records Using Topic Modeling Feature Extraction.","authors":"Peng Wu, Tianchen Xu, Yuanjia Wang","doi":"10.1109/dsaa.2019.00054","DOIUrl":"10.1109/dsaa.2019.00054","url":null,"abstract":"<p><p>To address substantial heterogeneity in patient response to treatment of chronic disorders and achieve the promise of precision medicine, individualized treatment rules (ITRs) are estimated to tailor treatments according to patient-specific characteristics. Randomized controlled trials (RCTs) provide gold standard data for learning ITRs not subject to confounding bias. However, RCTs are often conducted under stringent inclusion/exclusion criteria, and participants in RCTs may not reflect the general patient population. Thus, ITRs learned from RCTs lack generalizability to the broader real world patient population. Real world databases such as electronic health records (EHRs) provide new resources as complements to RCTs to facilitate evidence-based research for personalized medicine. However, to ensure the validity of ITRs learned from EHRs, a number of challenges including confounding bias and selection bias must be addressed. In this work, we propose a matching-based machine learning method to estimate optimal individualized treatment rules from EHRs using interpretable features extracted from EHR documentation of medications and ICD diagnoses codes. We use a latent Dirichlet allocation (LDA) model to extract latent topics and weights as features for learning ITRs. Our method achieves confounding reduction in observational studies through matching treated and untreated individuals and improves treatment optimization by augmenting feature space with clinically meaningful LDA-based features. We apply the method to EHR data collected at New York Presbyterian Hospital clinical data warehouse in studying optimal second-line treatment for type 2 diabetes (T2D) patients. We use cross validation to show that ITRs outperforms uniform treatment strategies (i.e., assigning same treatment to all individuals), and including topic modeling features leads to more reduction of post-treatment complications.</p>","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7035126/pdf/nihms-1557992.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37670824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data. 综合聚类的广义贝叶斯因子分析及其在多元统计数据中的应用。
Eun Jeong Min, Changgee Chang, Qi Long
Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.
综合聚类是一种针对多个数据集的聚类方法,这些数据集提供了一组共同主题的不同视图。它能够联合分析多组学数据,例如,识别疾病、细胞等的亚型,更准确地捕捉复杂的潜在生物过程。另一方面,在过去十年中,人们对将有关特征的先验结构知识纳入统计分析非常感兴趣。关于基因调控网络(通路)的知识可能被纳入许多基因组研究。在本文中,我们提出了一种新的综合聚类方法,该方法可以结合先验图知识。我们首先开发了一个广义贝叶斯因子分析(GBFA)框架,一种可以考虑图信息的稀疏贝叶斯因子分析。我们的GBFA框架在对因子载荷施加稀疏性之前使用尖峰和板状套索(SSL),在鼓励对相邻因子载荷进行平滑之前使用马尔可夫随机场(MRF),这建立了一个适用于载荷大小和图结构的统一收缩。然后,我们使用该框架来扩展iCluster+,这是一种基于因子分析的综合聚类方法。提出了一种新的变分EM算法来有效地估计因子负载的MAP估计器。广泛的模拟研究和对NCI60细胞系数据集的应用表明,所提出的方法是优越的,并提供了更具生物学意义的结果。
{"title":"Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data.","authors":"Eun Jeong Min,&nbsp;Changgee Chang,&nbsp;Qi Long","doi":"10.1109/DSAA.2018.00021","DOIUrl":"10.1109/DSAA.2018.00021","url":null,"abstract":"Integrative clustering is a clustering approach for multiple datasets, which provide different views of a common group of subjects. It enables analyzing multi-omics data jointly to, for example, identify the subtypes of diseases, cells, and so on, capturing the complex underlying biological processes more precisely. On the other hand, there has been a great deal of interest in incorporating the prior structural knowledge on the features into statistical analyses over the past decade. The knowledge on the gene regulatory network (pathways) can potentially be incorporated into many genomic studies. In this paper, we propose a novel integrative clustering method which can incorporate the prior graph knowledge. We first develop a generalized Bayesian factor analysis (GBFA) framework, a sparse Bayesian factor analysis which can take into account the graph information. Our GBFA framework employs the spike and slab lasso (SSL) prior to impose sparsity on the factor loadings and the Markov random field (MRF) prior to encourage smoothing over the adjacent factor loadings, which establishes a unified shrinkage adaptive to the loading size and the graph structure. Then, we use the framework to extend iCluster+, a factor analysis based integrative clustering approach. A novel variational EM algorithm is proposed to efficiently estimate the MAP estimator for the factor loadings. Extensive simulation studies and the application to the NCI60 cell line dataset demonstrate that the propose method is superior and delivers more biologically meaningful outcomes.","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/DSAA.2018.00021","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37253590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Outcome-Weighted Learning for Personalized Medicine with Multiple Treatment Options. 针对多种治疗方案的个性化医疗的成果加权学习。
Xuan Zhou, Yuanjia Wang, Donglin Zeng

To achieve personalized medicine, an individualized treatment strategy assigning treatment based on an individual's characteristics that leads to the largest benefit can be considered. Recently, a machine learning approach, O-learning, has been proposed to estimate an optimal individualized treatment rule (ITR), but it is developed to make binary decisions and thus limited to compare two treatments. When many treatment options are available, existing methods need to be adapted by transforming a multiple treatment selection problem into multiple binary treatment selections, for example, via one-vs-one or one-vs-all comparisons. However, combining multiple binary treatment selection rules into a single decision rule requires careful consideration, because it is known in the multicategory learning literature that some approaches may lead to ambiguous decision rules. In this work, we propose a novel and efficient method to generalize outcome-weighted learning for binary treatment to multi-treatment settings. We solve a multiple treatment selection problem via sequential weighted support vector machines. We prove that the resulting ITR is Fisher consistent and obtain the convergence rate of the estimated value function to the true optimal value, i.e., the estimated treatment rule leads to the maximal benefit when the data size goes to infinity. We conduct simulations to demonstrate that the proposed method has superior performance in terms of lower mis-allocation rates and improved expected values. An application to a three-arm randomized trial of major depressive disorder shows that an ITR tailored to individual patient's expectancy of treatment efficacy, their baseline depression severity and other characteristics reduces depressive symptoms more than non-personalized treatment strategies (e.g., treating all patients with combined pharmacotherapy and psychotherapy).

为了实现个性化医疗,可以考虑根据个体特征分配治疗的个体化治疗策略,从而获得最大收益。最近,有人提出了一种机器学习方法--O-learning,用于估算最佳个体化治疗规则(ITR),但这种方法是为做出二元决策而开发的,因此仅限于比较两种治疗方法。当有多种治疗方案可供选择时,就需要对现有方法进行调整,将多种治疗选择问题转化为多种二元治疗选择,例如,通过一比一或一比全比较。然而,将多个二元治疗选择规则组合成一个单一的决策规则需要慎重考虑,因为在多类别学习文献中已经知道,有些方法可能会导致决策规则含糊不清。在这项工作中,我们提出了一种新颖高效的方法,将二元治疗的结果加权学习推广到多治疗设置中。我们通过顺序加权支持向量机来解决多重治疗选择问题。我们证明了所得到的 ITR 是费雪一致的,并得到了估计值函数向真正最优值的收敛率,即当数据量达到无穷大时,估计的治疗规则会带来最大收益。我们通过模拟实验证明,所提出的方法在降低错误分配率和改善预期值方面具有更优越的性能。在一项针对重度抑郁障碍的三臂随机试验中的应用表明,与非个性化治疗策略(如对所有患者进行联合药物治疗和心理治疗)相比,根据患者对治疗效果的预期、他们的基线抑郁严重程度和其他特征定制的 ITR 能更有效地减少抑郁症状。
{"title":"Outcome-Weighted Learning for Personalized Medicine with Multiple Treatment Options.","authors":"Xuan Zhou, Yuanjia Wang, Donglin Zeng","doi":"10.1109/DSAA.2018.00072","DOIUrl":"10.1109/DSAA.2018.00072","url":null,"abstract":"<p><p>To achieve personalized medicine, an individualized treatment strategy assigning treatment based on an individual's characteristics that leads to the largest benefit can be considered. Recently, a machine learning approach, O-learning, has been proposed to estimate an optimal individualized treatment rule (ITR), but it is developed to make binary decisions and thus limited to compare two treatments. When many treatment options are available, existing methods need to be adapted by transforming a multiple treatment selection problem into multiple binary treatment selections, for example, via one-vs-one or one-vs-all comparisons. However, combining multiple binary treatment selection rules into a single decision rule requires careful consideration, because it is known in the multicategory learning literature that some approaches may lead to ambiguous decision rules. In this work, we propose a novel and efficient method to generalize outcome-weighted learning for binary treatment to multi-treatment settings. We solve a multiple treatment selection problem via sequential weighted support vector machines. We prove that the resulting ITR is Fisher consistent and obtain the convergence rate of the estimated value function to the true optimal value, i.e., the estimated treatment rule leads to the maximal benefit when the data size goes to infinity. We conduct simulations to demonstrate that the proposed method has superior performance in terms of lower mis-allocation rates and improved expected values. An application to a three-arm randomized trial of major depressive disorder shows that an ITR tailored to individual patient's expectancy of treatment efficacy, their baseline depression severity and other characteristics reduces depressive symptoms more than non-personalized treatment strategies (e.g., treating all patients with combined pharmacotherapy and psychotherapy).</p>","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437674/pdf/nihms-1009424.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37107047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Approach for Estimating Multiple Sparse Precision Matrices Using ℓ0, 0 Regularization 一种利用l0,0正则化估计多个稀疏精度矩阵的新方法
Phan Duy Nhat, Hoai An Le Thi
{"title":"A Novel Approach for Estimating Multiple Sparse Precision Matrices Using ℓ0, 0 Regularization","authors":"Phan Duy Nhat, Hoai An Le Thi","doi":"10.1109/DSAA.2017.40","DOIUrl":"https://doi.org/10.1109/DSAA.2017.40","url":null,"abstract":"","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73172584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Highly Adaptive Lasso Estimator. 高度自适应套索估计器
David Benkeser, Mark van der Laan

Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets.

回归函数的估计是统计学习的一个共同目标。我们提出了一种新颖的非参数回归估计器,与许多现有方法不同的是,它不依赖于局部平滑性假设,也不使用局部平滑技术。相反,我们的估计器尊重全局平滑性约束,因为它属于一类具有左手极限的右手连续函数,其变化规范由常数限定。利用经验过程理论,我们建立了所提估计器的快速最小收敛率,并说明了如何使用标准软件构建这种估计器。在模拟中,我们证明了在各种数据生成机制中,我们的估计器的有限样本性能与其他流行的机器学习技术相比具有竞争力。我们还利用几个公开的数据集,在实际数据示例中说明了具有竞争力的性能。
{"title":"The Highly Adaptive Lasso Estimator.","authors":"David Benkeser, Mark van der Laan","doi":"10.1109/DSAA.2016.93","DOIUrl":"10.1109/DSAA.2016.93","url":null,"abstract":"<p><p>Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets.</p>","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5662030/pdf/nihms870895.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35563483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward personal knowledge bases 面向个人知识库
S. Abiteboul
A Web user today has his/her data and information distributed in a number of services that operate in silos. Computer wizards already know how to control their personal data to some extent. It is now becoming possible for everyone to do the same, and there are many advantages to doing so. Everyone should now be in a position to manage his/her personal information. Furthermore, we will argue that we should move towards personal knowledge bases and discuss advantages to do so. We will mention recent works around a datalog dialect, namely Webdamlog.
今天的Web用户将他/她的数据和信息分布在许多竖井中运行的服务中。计算机向导已经知道如何在某种程度上控制他们的个人数据。现在每个人都这样做已经成为可能,这样做有很多好处。现在,每个人都应该能够管理自己的个人信息。此外,我们将论证我们应该走向个人知识库,并讨论这样做的好处。我们将提到最近关于一种数据日志方言的工作,即Webdamlog。
{"title":"Toward personal knowledge bases","authors":"S. Abiteboul","doi":"10.1109/DSAA.2015.7344775","DOIUrl":"https://doi.org/10.1109/DSAA.2015.7344775","url":null,"abstract":"A Web user today has his/her data and information distributed in a number of services that operate in silos. Computer wizards already know how to control their personal data to some extent. It is now becoming possible for everyone to do the same, and there are many advantages to doing so. Everyone should now be in a position to manage his/her personal information. Furthermore, we will argue that we should move towards personal knowledge bases and discuss advantages to do so. We will mention recent works around a datalog dialect, namely Webdamlog.","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79019885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Special session on trends & controversies in data science (TCDS) 数据科学趋势与争议特别会议(TCDS)
F. Forbes, Wray L. Buntine
As an emerging area, data science is facing great opportunities as well as challenges. Often arguments exist: What is data science? Why data science? We have information science already, why do we need data science? Do we need analytics science? Is analytics new? What is the difference between statistics and data analytics? What makes a data scientist? We believe that a special session on Trends and Controversy about data science and advanced analytics could bring insights from different mindsets for the healthy development of the science and society. Accordingly, this T&C special session will host talks by invitation to outline different views about today and future of data science. Invited speakers can contribute a paper (in the same format as the main conference submissions but could be less than 10 pages) to the special session, which will be handled by program co-chairs and accepted into the main conference proceeding probably by addressing comments from the program cochairs.
作为一个新兴的领域,数据科学面临着巨大的机遇和挑战。经常会有这样的争论:什么是数据科学?为什么是数据科学?我们已经有了信息科学,为什么还需要数据科学?我们需要分析科学吗?分析学是新事物吗?统计学和数据分析的区别是什么?是什么造就了数据科学家?我们相信,关于数据科学和高级分析的趋势和争议的特别会议可以为科学和社会的健康发展带来不同思维的见解。因此,本次T&C特别会议将受邀举办讲座,概述关于数据科学的今天和未来的不同观点。受邀演讲者可以向特别会议提交一篇论文(格式与主要会议提交的论文相同,但可以少于10页),该论文将由项目联合主席处理,并可能通过听取项目联合主席的意见而被接受为主要会议记录。
{"title":"Special session on trends & controversies in data science (TCDS)","authors":"F. Forbes, Wray L. Buntine","doi":"10.1109/DSAA.2015.7344776","DOIUrl":"https://doi.org/10.1109/DSAA.2015.7344776","url":null,"abstract":"As an emerging area, data science is facing great opportunities as well as challenges. Often arguments exist: What is data science? Why data science? We have information science already, why do we need data science? Do we need analytics science? Is analytics new? What is the difference between statistics and data analytics? What makes a data scientist? We believe that a special session on Trends and Controversy about data science and advanced analytics could bring insights from different mindsets for the healthy development of the science and society. Accordingly, this T&C special session will host talks by invitation to outline different views about today and future of data science. Invited speakers can contribute a paper (in the same format as the main conference submissions but could be less than 10 pages) to the special session, which will be handled by program co-chairs and accepted into the main conference proceeding probably by addressing comments from the program cochairs.","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82819738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The sexy job in the next ten years will be statisticians 未来十年最性感的工作将是统计学家
É. Moulines
The goal of exploratory data analysis or data mining is making sense of data. We develop theory and algorithms that help us understand our data, with the goal that this helps formulating better hypotheses. The role of statisticians is to provide methods that give detailed insight in how data is structured: characterising distributions in easily understandable terms, showing the most informative patterns, associations, correlations, etc. Statisticians are part of the big data science wave but which part exactly next to data accessibility, data communication and visualization?
探索性数据分析或数据挖掘的目标是理解数据。我们发展理论和算法来帮助我们理解我们的数据,目标是这有助于形成更好的假设。统计学家的角色是提供方法来详细了解数据的结构:用易于理解的术语描述分布,显示最有信息的模式、关联、相关性等。统计学家是大数据科学浪潮的一部分,但数据可访问性、数据通信和可视化究竟是哪一部分呢?
{"title":"The sexy job in the next ten years will be statisticians","authors":"É. Moulines","doi":"10.1109/DSAA.2015.7344777","DOIUrl":"https://doi.org/10.1109/DSAA.2015.7344777","url":null,"abstract":"The goal of exploratory data analysis or data mining is making sense of data. We develop theory and algorithms that help us understand our data, with the goal that this helps formulating better hypotheses. The role of statisticians is to provide methods that give detailed insight in how data is structured: characterising distributions in easily understandable terms, showing the most informative patterns, associations, correlations, etc. Statisticians are part of the big data science wave but which part exactly next to data accessibility, data communication and visualization?","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91081288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Welcome from DSAA 2014 chairs 欢迎来自2014年DSAA的主席们
Philip S. Yu, M. Kitsuregawa, H. Motoda, Bart Goethals, M. Guo, Longbing Cao, G. Karypis, Irwin King, Wei Wang
Data driven scientific discovery approach has already been agreed to be an important emerging paradigm for computing in areas including social, service, Internet of Things (or sensor networks), and cloud. Under this paradigm, Big Data is the core that drives new researches in many areas, from environmental to social. There are many new scientific challenges when facing this big data phenomenon, ranging from capture, creation, storage, search, sharing, analysis, and visualization. The complication here is not just the storage, I/O, query, and performance, but also the integration across heterogeneous, interdependent complex data resources for real-time decision-making, collaboration, and ultimately value co-creation. Data sciences encompass the larger areas of data analytics, machine learning and managing big data. Advanced data analytics has become essential to glean a deep understanding of large data sets and to convert data into actionable intelligence. With the rapid growth in the volumes of data available to enterprises, Government and on the web, automated techniques for analyzing the data have become essential.
数据驱动的科学发现方法已经被认为是社交、服务、物联网(或传感器网络)和云计算等领域计算的重要新兴范式。在这种范式下,从环境到社会,大数据是推动许多领域新研究的核心。面对这种大数据现象,有许多新的科学挑战,包括捕获、创建、存储、搜索、共享、分析和可视化。这里的复杂性不仅在于存储、I/O、查询和性能,还在于跨异构、相互依赖的复杂数据资源的集成,以实现实时决策、协作和最终的价值共同创造。数据科学包括数据分析、机器学习和大数据管理等更大的领域。高级数据分析对于收集对大型数据集的深刻理解并将数据转换为可操作的情报至关重要。随着企业、政府和网络上可用数据量的快速增长,分析数据的自动化技术已经变得必不可少。
{"title":"Welcome from DSAA 2014 chairs","authors":"Philip S. Yu, M. Kitsuregawa, H. Motoda, Bart Goethals, M. Guo, Longbing Cao, G. Karypis, Irwin King, Wei Wang","doi":"10.1109/DSAA.2014.7058034","DOIUrl":"https://doi.org/10.1109/DSAA.2014.7058034","url":null,"abstract":"Data driven scientific discovery approach has already been agreed to be an important emerging paradigm for computing in areas including social, service, Internet of Things (or sensor networks), and cloud. Under this paradigm, Big Data is the core that drives new researches in many areas, from environmental to social. There are many new scientific challenges when facing this big data phenomenon, ranging from capture, creation, storage, search, sharing, analysis, and visualization. The complication here is not just the storage, I/O, query, and performance, but also the integration across heterogeneous, interdependent complex data resources for real-time decision-making, collaboration, and ultimately value co-creation. Data sciences encompass the larger areas of data analytics, machine learning and managing big data. Advanced data analytics has become essential to glean a deep understanding of large data sets and to convert data into actionable intelligence. With the rapid growth in the volumes of data available to enterprises, Government and on the web, automated techniques for analyzing the data have become essential.","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82806771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Similarity analysis of service descriptions for efficient Web service discovery 对服务描述进行相似性分析,以实现高效的Web服务发现
S. SowmyaKamath, S. AnanthanarayanaV.
{"title":"Similarity analysis of service descriptions for efficient Web service discovery","authors":"S. SowmyaKamath, S. AnanthanarayanaV.","doi":"10.1109/DSAA.2014.7058065","DOIUrl":"https://doi.org/10.1109/DSAA.2014.7058065","url":null,"abstract":"","PeriodicalId":92122,"journal":{"name":"Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77089068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1