首页 > 最新文献

ACM/IMS transactions on data science最新文献

英文 中文
Recent Developments in Privacy-Preserving Mining of Clinical Data. 临床数据隐私保护挖掘的最新进展。
Pub Date : 2021-11-01 DOI: 10.1145/3447774
Chance Desmet, Diane J Cook

With the dramatic increases in both the capability to collect personal data and the capability to analyze large amounts of data, increasingly sophisticated and personal insights are being drawn. These insights are valuable for clinical applications but also open up possibilities for identification and abuse of personal information. In this paper, we survey recent research on classical methods of privacy-preserving data mining. Looking at dominant techniques and recent innovations to them, we examine the applicability of these methods to the privacy-preserving analysis of clinical data. We also discuss promising directions for future research in this area.

随着收集个人数据的能力和分析大量数据的能力的急剧提高,人们越来越深入地了解个人情况。这些见解对临床应用很有价值,但也为识别和滥用个人信息开辟了可能性。在本文中,我们综述了最近对隐私保护数据挖掘的经典方法的研究。通过观察主流技术和最近的创新,我们检验了这些方法在临床数据隐私保护分析中的适用性。我们还讨论了该领域未来研究的前景。
{"title":"Recent Developments in Privacy-Preserving Mining of Clinical Data.","authors":"Chance Desmet, Diane J Cook","doi":"10.1145/3447774","DOIUrl":"10.1145/3447774","url":null,"abstract":"<p><p>With the dramatic increases in both the capability to collect personal data and the capability to analyze large amounts of data, increasingly sophisticated and personal insights are being drawn. These insights are valuable for clinical applications but also open up possibilities for identification and abuse of personal information. In this paper, we survey recent research on classical methods of privacy-preserving data mining. Looking at dominant techniques and recent innovations to them, we examine the applicability of these methods to the privacy-preserving analysis of clinical data. We also discuss promising directions for future research in this area.</p>","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8746818/pdf/nihms-1678257.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39814074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on the Role of Centrality as Seed Nodes for Information Propagation in Large Scale Network 中心性作为种子节点在大规模网络信息传播中的作用综述
Pub Date : 2021-08-21 DOI: 10.1145/3465374
Paramita Dey, Subhayan Bhattacharya, Sarbani Roy
From the popular concept of six-degree separation, social networks are generally analyzed in the perspective of small world networks where centrality of nodes play a pivotal role in information propagation. However, working with a large dataset of a scale-free network (which follows power law) may be different due to the nature of the social graph. Moreover, the derivation of centrality may be difficult due to the computational complexity of identifying centrality measures. This study provides a comprehensive and extensive review and comparison of seven centrality measures (clustering coefficients, Node degree, K-core, Betweenness, Closeness, Eigenvector, PageRank) using four information propagation methods (Breadth First Search, Random Walk, Susceptible-Infected-Removed, Forest Fire). Five benchmark similarity measures (Tanimoto, Hamming, Dice, Sorensen, Jaccard) have been used to measure the similarity between the seed nodes identified using the centrality measures with actual source seeds derived through Google's LargeStar-SmallStar algorithm on Twitter Stream Data. MapReduce has been utilized for identifying the seed nodes based on centrality measures and for information propagation simulation. It is observed that most of the centrality measures perform well compared to the actual source in the initial stage but are saturated after a certain level of influence maximization in terms of both affected nodes and propagation level.
从流行的六度分离概念出发,一般从小世界网络的角度来分析社交网络,其中节点的中心性在信息传播中起着关键作用。然而,由于社交图的性质,使用无标度网络(遵循幂律)的大型数据集可能会有所不同。此外,由于识别中心性度量的计算复杂性,中心性的推导可能很困难。本研究使用四种信息传播方法(广度优先搜索、随机行走、敏感-感染-移除、森林火灾),对7种中心性度量(聚类系数、节点度、K-core、betweness、Closeness、Eigenvector、PageRank)进行了全面而广泛的回顾和比较。五个基准相似性度量(Tanimoto, Hamming, Dice, Sorensen, Jaccard)已被用于度量使用中心性度量识别的种子节点与通过谷歌的LargeStar-SmallStar算法在Twitter流数据上导出的实际源种子之间的相似性。MapReduce被用于基于中心性度量的种子节点识别和信息传播模拟。可以观察到,大多数中心性度量在初始阶段与实际源相比表现良好,但在受影响节点和传播水平达到一定程度的影响最大化后,中心性度量就饱和了。
{"title":"A Survey on the Role of Centrality as Seed Nodes for Information Propagation in Large Scale Network","authors":"Paramita Dey, Subhayan Bhattacharya, Sarbani Roy","doi":"10.1145/3465374","DOIUrl":"https://doi.org/10.1145/3465374","url":null,"abstract":"From the popular concept of six-degree separation, social networks are generally analyzed in the perspective of small world networks where centrality of nodes play a pivotal role in information propagation. However, working with a large dataset of a scale-free network (which follows power law) may be different due to the nature of the social graph. Moreover, the derivation of centrality may be difficult due to the computational complexity of identifying centrality measures. This study provides a comprehensive and extensive review and comparison of seven centrality measures (clustering coefficients, Node degree, K-core, Betweenness, Closeness, Eigenvector, PageRank) using four information propagation methods (Breadth First Search, Random Walk, Susceptible-Infected-Removed, Forest Fire). Five benchmark similarity measures (Tanimoto, Hamming, Dice, Sorensen, Jaccard) have been used to measure the similarity between the seed nodes identified using the centrality measures with actual source seeds derived through Google's LargeStar-SmallStar algorithm on Twitter Stream Data. MapReduce has been utilized for identifying the seed nodes based on centrality measures and for information propagation simulation. It is observed that most of the centrality measures perform well compared to the actual source in the initial stage but are saturated after a certain level of influence maximization in terms of both affected nodes and propagation level.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46862515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning PoBery:可能通过概率数据放置和扫描完成大数据查询
Pub Date : 2021-08-21 DOI: 10.1145/3465375
Jie Song, Qiang He, Feifei Chen, Ye Yuan, Ge Yu
In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.
在大数据查询处理中,查询准确性和查询效率之间存在权衡,例如,抽样查询方法会权衡查询完整性和效率。在本文中,我们认为可以通过稍微失去查询完整性的可能性(即查询完整的机会)来显著提高查询性能。为了量化这种可能性,我们定义了一个新的概念,查询完整性概率(以下简称PC)。例如,如果一个查询被执行了100次,PC=0.95保证在100个结果中不超过5个不完整的结果。利用概率数据放置和扫描,我们用PC换取查询性能。在本文中,我们提出了PoBery(Possible complete Big data quERY),这是一种既不支持完整查询也不支持不完整查询,但可能支持完整查询的方法。在HiBench上进行的实验结果证明,PoBery可以在保证PC的同时显著加速查询。具体而言,它保证了完整查询的百分比大于给定的PC置信度。通过与最先进的键值存储的比较,我们发现,虽然基于Drill的PoBery在完整查询上的速度与Drill一样快,但在可能的完整查询上,它的平均速度分别比Drill、Impala和Hive快1.7倍、1.1倍和1.5倍。
{"title":"PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning","authors":"Jie Song, Qiang He, Feifei Chen, Ye Yuan, Ge Yu","doi":"10.1145/3465375","DOIUrl":"https://doi.org/10.1145/3465375","url":null,"abstract":"In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43046454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DataStorm: Coupled, Continuous Simulations for Complex Urban Environments DataStorm:复杂城市环境的耦合连续模拟
Pub Date : 2021-07-12 DOI: 10.1145/3447572
H. Behrens, K. Candan, Xilun Chen, Yash Garg, Mao-Lin Li, Xinsheng Li, Sicong Liu, M. Sapino, M. Shadab, Dalton Turner, Magesh Vijayakumaren
Urban systems are characterized by complexity and dynamicity. Data-driven simulations represent a promising approach in understanding and predicting complex dynamic processes in the presence of shifting demands of urban systems. Yet, today’s silo-based, de-coupled simulation engines fail to provide an end-to-end view of the complex urban system, preventing informed decision-making. In this article, we present DataStorm to support integration of existing simulation, analysis and visualization components into integrated workflows. DataStorm provides a flow engine, DataStorm-FE, for coordinating data and decision flows among multiple actors (each representing a model, analytic operation, or a decision criterion) and enables ensemble planning and optimization across cloud resources. DataStorm provides native support for simulation ensemble creation through parameter space sampling to decide which simulations to run, as well as distributed instantiation and parallel execution of simulation instances on cluster resources. Recognizing that simulation ensembles are inherently sparse relative to the potential parameter space, we also present a density-boosting partition-stitch sampling scheme to increase the effective density of the simulation ensemble through a sub-space partitioning scheme, complemented with an efficient stitching mechanism that leverages partial and imperfect knowledge from partial dynamical systems to effectively obtain a global view of the complex urban process being simulated.
城市系统具有复杂性和动态性。在城市系统需求不断变化的情况下,数据驱动的模拟是理解和预测复杂动态过程的一种很有前途的方法。然而,如今基于筒仓的去耦合模拟引擎无法提供复杂城市系统的端到端视图,阻碍了知情决策。在本文中,我们介绍了DataStorm,以支持将现有的模拟、分析和可视化组件集成到集成的工作流中。DataStorm提供了一个流引擎DataStorm FE,用于协调多个参与者之间的数据和决策流(每个参与者代表一个模型、分析操作或决策标准),并实现跨云资源的集成规划和优化。DataStorm通过参数空间采样来决定运行哪些模拟,以及模拟实例在集群资源上的分布式实例化和并行执行,为模拟集成创建提供了本地支持。认识到仿真系综相对于潜在参数空间本质上是稀疏的,我们还提出了一种密度提升分区缝合采样方案,通过子空间分区方案来增加仿真系综的有效密度,辅以有效的缝合机制,该机制利用部分动力系统的部分和不完全知识,有效地获得正在模拟的复杂城市过程的全局视图。
{"title":"DataStorm: Coupled, Continuous Simulations for Complex Urban Environments","authors":"H. Behrens, K. Candan, Xilun Chen, Yash Garg, Mao-Lin Li, Xinsheng Li, Sicong Liu, M. Sapino, M. Shadab, Dalton Turner, Magesh Vijayakumaren","doi":"10.1145/3447572","DOIUrl":"https://doi.org/10.1145/3447572","url":null,"abstract":"Urban systems are characterized by complexity and dynamicity. Data-driven simulations represent a promising approach in understanding and predicting complex dynamic processes in the presence of shifting demands of urban systems. Yet, today’s silo-based, de-coupled simulation engines fail to provide an end-to-end view of the complex urban system, preventing informed decision-making. In this article, we present DataStorm to support integration of existing simulation, analysis and visualization components into integrated workflows. DataStorm provides a flow engine, DataStorm-FE, for coordinating data and decision flows among multiple actors (each representing a model, analytic operation, or a decision criterion) and enables ensemble planning and optimization across cloud resources. DataStorm provides native support for simulation ensemble creation through parameter space sampling to decide which simulations to run, as well as distributed instantiation and parallel execution of simulation instances on cluster resources. Recognizing that simulation ensembles are inherently sparse relative to the potential parameter space, we also present a density-boosting partition-stitch sampling scheme to increase the effective density of the simulation ensemble through a sub-space partitioning scheme, complemented with an efficient stitching mechanism that leverages partial and imperfect knowledge from partial dynamical systems to effectively obtain a global view of the complex urban process being simulated.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3447572","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46756618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TabReformer: Unsupervised Representation Learning for Erroneous Data Detection TabReformer:用于错误数据检测的无监督表示学习
Pub Date : 2021-05-17 DOI: 10.1145/3447541
Mona Nashaat, Aindrila Ghosh, James Miller, Shaikh Quader
Error detection is a crucial preliminary phase in any data analytics pipeline. Existing error detection techniques typically target specific types of errors. Moreover, most of these detection models either require user-defined rules or ample hand-labeled training examples. Therefore, in this article, we present TabReformer, a model that learns bidirectional encoder representations for tabular data. The proposed model consists of two main phases. In the first phase, TabReformer follows encoder architecture with multiple self-attention layers to model the dependencies between cells and capture tuple-level representations. Also, the model utilizes a Gaussian Error Linear Unit activation function with the Masked Data Model objective to achieve deeper probabilistic understanding. In the second phase, the model parameters are fine-tuned for the task of erroneous data detection. The model applies a data augmentation module to generate more erroneous examples to represent the minority class. The experimental evaluation considers a wide range of databases with different types of errors and distributions. The empirical results show that our solution can enhance the recall values by 32.95% on average compared with state-of-the-art techniques while reducing the manual effort by up to 48.86%.
错误检测是任何数据分析管道中至关重要的初步阶段。现有的错误检测技术通常针对特定类型的错误。此外,大多数检测模型要么需要用户定义的规则,要么需要大量手工标记的训练示例。因此,在本文中,我们提出了TabReformer,这是一个学习表格数据的双向编码器表示的模型。提出的模型包括两个主要阶段。在第一阶段,TabReformer遵循具有多个自关注层的编码器架构,对单元格之间的依赖关系进行建模,并捕获元级表示。此外,该模型利用高斯误差线性单元激活函数与屏蔽数据模型目标,以实现更深入的概率理解。在第二阶段,对模型参数进行微调,以完成错误数据检测任务。该模型使用数据增强模块生成更多的错误示例来表示少数类。实验评估考虑了具有不同类型误差和分布的广泛数据库。实证结果表明,与现有技术相比,我们的解决方案可以将召回值平均提高32.95%,同时减少高达48.86%的人工工作量。
{"title":"TabReformer: Unsupervised Representation Learning for Erroneous Data Detection","authors":"Mona Nashaat, Aindrila Ghosh, James Miller, Shaikh Quader","doi":"10.1145/3447541","DOIUrl":"https://doi.org/10.1145/3447541","url":null,"abstract":"Error detection is a crucial preliminary phase in any data analytics pipeline. Existing error detection techniques typically target specific types of errors. Moreover, most of these detection models either require user-defined rules or ample hand-labeled training examples. Therefore, in this article, we present TabReformer, a model that learns bidirectional encoder representations for tabular data. The proposed model consists of two main phases. In the first phase, TabReformer follows encoder architecture with multiple self-attention layers to model the dependencies between cells and capture tuple-level representations. Also, the model utilizes a Gaussian Error Linear Unit activation function with the Masked Data Model objective to achieve deeper probabilistic understanding. In the second phase, the model parameters are fine-tuned for the task of erroneous data detection. The model applies a data augmentation module to generate more erroneous examples to represent the minority class. The experimental evaluation considers a wide range of databases with different types of errors and distributions. The empirical results show that our solution can enhance the recall values by 32.95% on average compared with state-of-the-art techniques while reducing the manual effort by up to 48.86%.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3447541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48123450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Boosting the Restoring Performance of Deduplication Data by Classifying Backup Metadata 通过备份元数据分类提升重删数据恢复性能
Pub Date : 2021-04-21 DOI: 10.1145/3437261
Ru Yang, Yuhui Deng, Yi Zhou, Ping Huang
Restoring data is the main purpose of data backup in storage systems. The fragmentation issue, caused by physically scattering logically continuous data across a variety of disk locations, poses a negative impact on the restoring performance of a deduplication system. Rewriting algorithms are used to alleviate the fragmentation problem by improving the restoring speed of a deduplication system. However, rewriting methods give birth to a big sacrifice in terms of deduplication ratio, leading to a huge storage space waste. Furthermore, traditional backup approaches treat file metadata and chunk metadata as the same, which causes frequent on-disk metadata accesses. In this article, we start by analyzing storage characteristics of backup metadata. An intriguing finding shows that with 10 million files, the file metadata merely takes up approximately 340 MB. Motivated by this finding, we propose a Classified-Metadata based Restoring method (CMR) that classifies backup metadata into file metadata and chunk metadata. Because the file metadata merely takes up a meager amount of space, CMR maintains all file metadata in memory, whereas chunk metadata are aggressively prefetched to memory in a greedy manner. A deduplication system with CMR in place exhibits three salient features: (i) It avoids rewriting algorithms’ additional overhead by reducing the number of disk reads in a restoring process, (ii) it increases the restoring throughput without sacrificing the deduplication ratio, and (iii) it thoroughly leverages the hardware resources to boost the restoring performance. To quantitatively evaluate the performance of CMR, we compare our CMR against two state-of-the-art approaches, namely, a history-aware rewriting method (HAR) and a context-based rewriting scheme (CAP). The experimental results show that compared to HAR and CAP, CMR reduces the restoring time by 27.2% and 29.3%, respectively. Moreover, the deduplication ratio is improved by 1.91% and 4.36%, respectively.
恢复数据是存储系统备份数据的主要目的。由于逻辑上连续的数据在物理上分散在不同的磁盘位置,碎片问题会对重复数据删除系统的恢复性能产生负面影响。重写算法通过提高重删系统的恢复速度来缓解分片问题。但是,重写方式在重复数据删除率方面付出了很大的代价,导致存储空间的巨大浪费。此外,传统的备份方法将文件元数据和块元数据视为相同,这导致频繁访问磁盘上的元数据。在本文中,我们首先分析备份元数据的存储特征。一个有趣的发现表明,在1000万个文件中,文件元数据仅占用约340 MB。基于这一发现,我们提出了一种基于分类元数据的恢复方法(CMR),该方法将备份元数据分为文件元数据和块元数据。因为文件元数据只占用少量空间,所以CMR在内存中维护所有文件元数据,而块元数据则以贪婪的方式主动预取到内存中。采用CMR的重复数据删除系统具有三个显著特征:(i)通过减少恢复过程中的磁盘读取次数,避免了重写算法的额外开销;(ii)在不牺牲重复数据删除比率的情况下增加了恢复吞吐量;(iii)充分利用了硬件资源来提高恢复性能。为了定量评估CMR的性能,我们将CMR与两种最先进的方法进行了比较,即历史感知重写方法(HAR)和基于上下文的重写方案(CAP)。实验结果表明,与HAR和CAP相比,CMR的恢复时间分别缩短了27.2%和29.3%。重复数据删除率分别提高1.91%和4.36%。
{"title":"Boosting the Restoring Performance of Deduplication Data by Classifying Backup Metadata","authors":"Ru Yang, Yuhui Deng, Yi Zhou, Ping Huang","doi":"10.1145/3437261","DOIUrl":"https://doi.org/10.1145/3437261","url":null,"abstract":"Restoring data is the main purpose of data backup in storage systems. The fragmentation issue, caused by physically scattering logically continuous data across a variety of disk locations, poses a negative impact on the restoring performance of a deduplication system. Rewriting algorithms are used to alleviate the fragmentation problem by improving the restoring speed of a deduplication system. However, rewriting methods give birth to a big sacrifice in terms of deduplication ratio, leading to a huge storage space waste. Furthermore, traditional backup approaches treat file metadata and chunk metadata as the same, which causes frequent on-disk metadata accesses. In this article, we start by analyzing storage characteristics of backup metadata. An intriguing finding shows that with 10 million files, the file metadata merely takes up approximately 340 MB. Motivated by this finding, we propose a Classified-Metadata based Restoring method (CMR) that classifies backup metadata into file metadata and chunk metadata. Because the file metadata merely takes up a meager amount of space, CMR maintains all file metadata in memory, whereas chunk metadata are aggressively prefetched to memory in a greedy manner. A deduplication system with CMR in place exhibits three salient features: (i) It avoids rewriting algorithms’ additional overhead by reducing the number of disk reads in a restoring process, (ii) it increases the restoring throughput without sacrificing the deduplication ratio, and (iii) it thoroughly leverages the hardware resources to boost the restoring performance. To quantitatively evaluate the performance of CMR, we compare our CMR against two state-of-the-art approaches, namely, a history-aware rewriting method (HAR) and a context-based rewriting scheme (CAP). The experimental results show that compared to HAR and CAP, CMR reduces the restoring time by 27.2% and 29.3%, respectively. Moreover, the deduplication ratio is improved by 1.91% and 4.36%, respectively.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 16"},"PeriodicalIF":0.0,"publicationDate":"2021-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3437261","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41530445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introduction to the Special Issue on Learning-based Support for Data Science Applications 数据科学应用基于学习的支持特刊导论
Pub Date : 2021-04-08 DOI: 10.1145/3450751
Ke Zhou, Jingkuan Song
This issue of ACM/IMS Transactions on Data Science (TDS) contains a collection of 6 articles from 25 submissions to the TDS journal. TDS is a Gold Open Access journal that publishes articles on cross-disciplinary innovative research ideas, algorithms, systems, theory, and applications for data science. Articles that address challenges at every stage, from acquisition on, through data cleaning, transformation, representation, integration, indexing, modeling, analysis, visualization, and interpretation, while retaining privacy, fairness, provenance, transparency, and provision of social benefit, within the context of big data, fall within the scope of the journal. The six accepted articles for this issue are representative in their respective fields, including communication, image processing, natural language processing, dark data, text recognition, and data deduplication. According to the traits of their own fields, these articles apply appropriate machine learning technologies and have achieved convincing results. We believe that these achievements can provide machine learning–based solutions for applications in real life and inspire more practical problems to be solved via machine learning techniques. In “Deep Hash–based Relevance-aware Data Quality Assessment for Image Dark Data,” authors propose a deep hash– based framework called DHR-DQA to lighten and assess image dark data. This framework combines deep learning, hashing technique, graph technique, and CV technique to explore a very advanced application, which is very enlightening. In “Boosting the Restoring Performance of Deduplication Data by Classifying Backup Metadata,” authors utilize machine learning techniques to complete a classic task in the storage field, and the results show the universality of machine learning techniques. From these articles, we observe that machine learning can solve almost all the application problems under rational condition. We hope readers enjoy this special issue and that these articles can enlighten their work.
本期ACM/IMS数据科学汇刊(TDS)收录了25篇提交给TDS期刊的6篇文章。TDS是一本黄金开放获取期刊,发表关于数据科学跨学科创新研究思想、算法、系统、理论和应用的文章。在大数据的背景下,从获取到数据清理、转换、表示、集成、索引、建模、分析、可视化和解释,同时保留隐私、公平、出处、透明度和社会利益的各个阶段都面临挑战的文章属于该杂志的范围。本期公认的六篇文章在各自领域具有代表性,包括通信、图像处理、自然语言处理、暗数据、文本识别和重复数据消除。这些文章根据各自领域的特点,应用了适当的机器学习技术,取得了令人信服的效果。我们相信,这些成就可以为现实生活中的应用提供基于机器学习的解决方案,并启发更多的实际问题通过机器学习技术来解决。在“图像暗数据的基于深度哈希的相关性感知数据质量评估”中,作者提出了一种称为DHR-DQA的基于深度散列的框架来减轻和评估图像暗数据。该框架结合了深度学习、哈希技术、图形技术和CV技术,探索了一个非常先进的应用程序,非常有启发性。在“通过对备份元数据进行分类来提高重复数据消除数据的恢复性能”一文中,作者利用机器学习技术完成了存储领域的一项经典任务,结果表明了机器学习技术的普遍性。从这些文章中,我们观察到机器学习可以在合理的条件下解决几乎所有的应用问题。我们希望读者喜欢这期特刊,希望这些文章能对他们的作品有所启发。
{"title":"Introduction to the Special Issue on Learning-based Support for Data Science Applications","authors":"Ke Zhou, Jingkuan Song","doi":"10.1145/3450751","DOIUrl":"https://doi.org/10.1145/3450751","url":null,"abstract":"This issue of ACM/IMS Transactions on Data Science (TDS) contains a collection of 6 articles from 25 submissions to the TDS journal. TDS is a Gold Open Access journal that publishes articles on cross-disciplinary innovative research ideas, algorithms, systems, theory, and applications for data science. Articles that address challenges at every stage, from acquisition on, through data cleaning, transformation, representation, integration, indexing, modeling, analysis, visualization, and interpretation, while retaining privacy, fairness, provenance, transparency, and provision of social benefit, within the context of big data, fall within the scope of the journal. The six accepted articles for this issue are representative in their respective fields, including communication, image processing, natural language processing, dark data, text recognition, and data deduplication. According to the traits of their own fields, these articles apply appropriate machine learning technologies and have achieved convincing results. We believe that these achievements can provide machine learning–based solutions for applications in real life and inspire more practical problems to be solved via machine learning techniques. In “Deep Hash–based Relevance-aware Data Quality Assessment for Image Dark Data,” authors propose a deep hash– based framework called DHR-DQA to lighten and assess image dark data. This framework combines deep learning, hashing technique, graph technique, and CV technique to explore a very advanced application, which is very enlightening. In “Boosting the Restoring Performance of Deduplication Data by Classifying Backup Metadata,” authors utilize machine learning techniques to complete a classic task in the storage field, and the results show the universality of machine learning techniques. From these articles, we observe that machine learning can solve almost all the application problems under rational condition. We hope readers enjoy this special issue and that these articles can enlighten their work.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 1"},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3450751","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41765295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Temporal Patterns of Cyberbullying Detection with Hierarchical Attention Networks 用层次注意网络建模网络欺凌检测的时间模式
Pub Date : 2021-04-02 DOI: 10.1145/3441141
Lu Cheng, Ruocheng Guo, Yasin N. Silva, Deborah L. Hall, Huan Liu
Cyberbullying is rapidly becoming one of the most serious online risks for adolescents. This has motivated work on machine learning methods to automate the process of cyberbullying detection, which have so far mostly viewed cyberbullying as one-off incidents that occur at a single point in time. Comparatively less is known about how cyberbullying behavior occurs and evolves over time. This oversight highlights a crucial open challenge for cyberbullying-related research, given that cyberbullying is typically defined as intentional acts of aggression via electronic communication that occur repeatedly and persistently. In this article, we center our discussion on the challenge of modeling temporal patterns of cyberbullying behavior. Specifically, we investigate how temporal information within a social media session, which has an inherently hierarchical structure (e.g., words form a comment and comments form a session), can be leveraged to facilitate cyberbullying detection. Recent findings from interdisciplinary research suggest that the temporal characteristics of bullying sessions differ from those of non-bullying sessions and that the temporal information from users’ comments can improve cyberbullying detection. The proposed framework consists of three distinctive features: (1) a hierarchical structure that reflects how a social media session is formed in a bottom-up manner; (2) attention mechanisms applied at the word- and comment-level to differentiate the contributions of words and comments to the representation of a social media session; and (3) the incorporation of temporal features in modeling cyberbullying behavior at the comment-level. Quantitative and qualitative evaluations are conducted on a real-world dataset collected from Instagram, the social networking site with the highest percentage of users reporting cyberbullying experiences. Results from empirical evaluations show the significance of the proposed methods, which are tailored to capture temporal patterns of cyberbullying detection.
网络欺凌正迅速成为青少年最严重的网络风险之一。这推动了机器学习方法的研究,使网络欺凌检测过程自动化,到目前为止,这些方法大多将网络欺凌视为在单个时间点发生的一次性事件。相对而言,人们对网络欺凌行为是如何随着时间的推移而发生和演变的知之甚少。这一监督突显了网络欺凌相关研究面临的一个关键的公开挑战,因为网络欺凌通常被定义为通过电子通信反复持续发生的故意攻击行为。在这篇文章中,我们集中讨论了网络欺凌行为时间模式建模的挑战。具体而言,我们研究了社交媒体会话中的时间信息如何被利用来促进网络欺凌检测,社交媒体会话具有固有的层次结构(例如,单词形成评论,评论形成会话)。跨学科研究的最新发现表明,欺凌会话的时间特征与非欺凌会话不同,来自用户评论的时间信息可以提高网络欺凌检测。所提出的框架由三个显著特征组成:(1)层次结构,反映了社交媒体会话是如何以自下而上的方式形成的;(2) 在单词和评论层面应用的注意力机制,以区分单词和评论对社交媒体会话表现的贡献;以及(3)将时间特征纳入评论层面的网络欺凌行为建模中。定量和定性评估是在从Instagram收集的真实世界数据集上进行的,Instagram是报告网络欺凌经历的用户比例最高的社交网站。实证评估的结果表明了所提出的方法的重要性,这些方法是为捕捉网络欺凌检测的时间模式而定制的。
{"title":"Modeling Temporal Patterns of Cyberbullying Detection with Hierarchical Attention Networks","authors":"Lu Cheng, Ruocheng Guo, Yasin N. Silva, Deborah L. Hall, Huan Liu","doi":"10.1145/3441141","DOIUrl":"https://doi.org/10.1145/3441141","url":null,"abstract":"Cyberbullying is rapidly becoming one of the most serious online risks for adolescents. This has motivated work on machine learning methods to automate the process of cyberbullying detection, which have so far mostly viewed cyberbullying as one-off incidents that occur at a single point in time. Comparatively less is known about how cyberbullying behavior occurs and evolves over time. This oversight highlights a crucial open challenge for cyberbullying-related research, given that cyberbullying is typically defined as intentional acts of aggression via electronic communication that occur repeatedly and persistently. In this article, we center our discussion on the challenge of modeling temporal patterns of cyberbullying behavior. Specifically, we investigate how temporal information within a social media session, which has an inherently hierarchical structure (e.g., words form a comment and comments form a session), can be leveraged to facilitate cyberbullying detection. Recent findings from interdisciplinary research suggest that the temporal characteristics of bullying sessions differ from those of non-bullying sessions and that the temporal information from users’ comments can improve cyberbullying detection. The proposed framework consists of three distinctive features: (1) a hierarchical structure that reflects how a social media session is formed in a bottom-up manner; (2) attention mechanisms applied at the word- and comment-level to differentiate the contributions of words and comments to the representation of a social media session; and (3) the incorporation of temporal features in modeling cyberbullying behavior at the comment-level. Quantitative and qualitative evaluations are conducted on a real-world dataset collected from Instagram, the social networking site with the highest percentage of users reporting cyberbullying experiences. Results from empirical evaluations show the significance of the proposed methods, which are tailored to capture temporal patterns of cyberbullying detection.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3441141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46255488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Deep Hash-based Relevance-aware Data Quality Assessment for Image Dark Data 基于深度哈希的图像暗数据相关感知数据质量评估
Pub Date : 2021-04-02 DOI: 10.1145/3420038
Yu Liu, Yangtao Wang, Lianli Gao, Chan Guo, Yanzhao Xie, Zhili Xiao
Data mining can hardly solve but always faces a problem that there is little meaningful information within the dataset serving a given requirement. Faced with multiple unknown datasets, to allocate data mining resources to acquire more desired data, it is necessary to establish a data quality assessment framework based on the relevance between the dataset and requirements. This framework can help the user to judge the potential benefits in advance, so as to optimize the resource allocation to those candidates. However, the unstructured data (e.g., image data) often presents dark data states, which makes it tricky for the user to understand the relevance based on content of the dataset in real time. Even if all data have label descriptions, how to measure the relevance between data efficiently under semantic propagation remains an urgent problem. Based on this, we propose a Deep Hash-based Relevance-aware Data Quality Assessment framework, which contains off-line learning and relevance mining parts as well as an on-line assessing part. In the off-line part, we first design a Graph Convolution Network (GCN)-AutoEncoder hash (GAH) algorithm to recognize the data (i.e., lighten the dark data), then construct a graph with restricted Hamming distance, and finally design a Cluster PageRank (CPR) algorithm to calculate the importance score for each node (image) so as to obtain the relevance representation based on semantic propagation. In the on-line part, we first retrieve the importance score by hash codes and then quickly get the assessment conclusion in the importance list. On the one hand, the introduction of GCN and co-occurrence probability in the GAH promotes the perception ability for dark data. On the other hand, the design of CPR utilizes hash collision to reduce the scale of graph and iteration matrix, which greatly decreases the consumption of space and computing resources. We conduct extensive experiments on both single-label and multi-label datasets to assess the relevance between data and requirements as well as test the resources allocation. Experimental results show our framework can gain the most desired data with the same mining resources. Besides, the test results on Tencent1M dataset demonstrate the framework can complete the assessment with a stability for given different requirements.
数据挖掘很难解决问题,但总是面临这样一个问题,即在满足给定需求的数据集中几乎没有有意义的信息。面对多个未知数据集,为了分配数据挖掘资源以获取更多期望的数据,有必要建立一个基于数据集与需求之间相关性的数据质量评估框架。该框架可以帮助用户提前判断潜在的利益,从而优化对这些候选人的资源分配。然而,非结构化数据(例如,图像数据)通常呈现暗数据状态,这使得用户难以实时理解基于数据集内容的相关性。即使所有数据都有标签描述,如何在语义传播下有效地测量数据之间的相关性仍然是一个紧迫的问题。在此基础上,我们提出了一个基于深度哈希的相关性感知数据质量评估框架,该框架包含离线学习和相关性挖掘部分以及在线评估部分。在离线部分,我们首先设计了一种图卷积网络(GCN)-AutoEncoder散列(GAH)算法来识别数据(即亮显暗数据),然后构造一个具有受限汉明距离的图,最后设计了聚类PageRank(CPR)算法来计算每个节点(图像)的重要性得分,从而获得基于语义传播的相关性表示。在在线部分,我们首先通过哈希码检索重要性分数,然后在重要性列表中快速得到评估结论。一方面,在GAH中引入GCN和同现概率提高了对暗数据的感知能力。另一方面,CPR的设计利用散列冲突来减少图和迭代矩阵的规模,从而大大减少了空间和计算资源的消耗。我们在单标签和多标签数据集上进行了广泛的实验,以评估数据和需求之间的相关性,并测试资源分配。实验结果表明,我们的框架可以在相同的挖掘资源下获得最需要的数据。此外,在Tencent1M数据集上的测试结果表明,该框架可以在给定的不同需求下稳定地完成评估。
{"title":"Deep Hash-based Relevance-aware Data Quality Assessment for Image Dark Data","authors":"Yu Liu, Yangtao Wang, Lianli Gao, Chan Guo, Yanzhao Xie, Zhili Xiao","doi":"10.1145/3420038","DOIUrl":"https://doi.org/10.1145/3420038","url":null,"abstract":"Data mining can hardly solve but always faces a problem that there is little meaningful information within the dataset serving a given requirement. Faced with multiple unknown datasets, to allocate data mining resources to acquire more desired data, it is necessary to establish a data quality assessment framework based on the relevance between the dataset and requirements. This framework can help the user to judge the potential benefits in advance, so as to optimize the resource allocation to those candidates. However, the unstructured data (e.g., image data) often presents dark data states, which makes it tricky for the user to understand the relevance based on content of the dataset in real time. Even if all data have label descriptions, how to measure the relevance between data efficiently under semantic propagation remains an urgent problem. Based on this, we propose a Deep Hash-based Relevance-aware Data Quality Assessment framework, which contains off-line learning and relevance mining parts as well as an on-line assessing part. In the off-line part, we first design a Graph Convolution Network (GCN)-AutoEncoder hash (GAH) algorithm to recognize the data (i.e., lighten the dark data), then construct a graph with restricted Hamming distance, and finally design a Cluster PageRank (CPR) algorithm to calculate the importance score for each node (image) so as to obtain the relevance representation based on semantic propagation. In the on-line part, we first retrieve the importance score by hash codes and then quickly get the assessment conclusion in the importance list. On the one hand, the introduction of GCN and co-occurrence probability in the GAH promotes the perception ability for dark data. On the other hand, the design of CPR utilizes hash collision to reduce the scale of graph and iteration matrix, which greatly decreases the consumption of space and computing resources. We conduct extensive experiments on both single-label and multi-label datasets to assess the relevance between data and requirements as well as test the resources allocation. Experimental results show our framework can gain the most desired data with the same mining resources. Besides, the test results on Tencent1M dataset demonstrate the framework can complete the assessment with a stability for given different requirements.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3420038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46805752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simultaneous Image Reconstruction and Feature Learning with 3D-CNNs for Image Set–Based Classification 基于图像集分类的三维神经网络图像重建和特征学习
Pub Date : 2021-04-02 DOI: 10.1145/3420037
Xinyu Zhang, Xiaocui Li, X.-Y. Jing, Li Cheng
Image set–based classification has attracted substantial research interest because of its broad applications. Recently, lots of methods based on feature learning or dictionary learning have been developed to solve this problem, and some of them have made gratifying achievements. However, most of them transform the image set into a 2D matrix or use 2D convolutional neural networks (CNNs) for feature learning, so the spatial and temporal information is missing. At the same time, these methods extract features from original images in which there may exist huge intra-class diversity. To explore a possible solution to these issues, we propose a simultaneous image reconstruction with deep learning and feature learning with 3D-CNNs (SIRFL) for image set classification. The proposed SIRFL approach consists of a deep image reconstruction network and a 3D-CNN-based feature learning network. The deep image reconstruction network is used to reduce the diversity of images from the same set, and the feature learning network can effectively retain spatial and temporal information by using 3D-CNNs. Extensive experimental results on five widely used datasets show that our SIRFL approach is a strong competitor for the state-of-the-art image set classification methods.
基于图像集的分类由于其广泛的应用而吸引了大量的研究兴趣。近年来,许多基于特征学习或字典学习的方法已经被开发出来解决这个问题,其中一些方法已经取得了可喜的成就。然而,他们大多将图像集转换为2D矩阵或使用2D卷积神经网络(CNNs)进行特征学习,因此缺少空间和时间信息。同时,这些方法从原始图像中提取可能存在巨大类内多样性的特征。为了探索这些问题的可能解决方案,我们提出了一种同时使用深度学习进行图像重建和使用3D细胞神经网络(SIRFL)进行特征学习的图像集分类方法。所提出的SIRFL方法由深度图像重建网络和基于3D CNN的特征学习网络组成。深度图像重建网络用于减少来自同一集合的图像的多样性,特征学习网络通过使用3D细胞神经网络可以有效地保留空间和时间信息。在五个广泛使用的数据集上的大量实验结果表明,我们的SIRFL方法是最先进的图像集分类方法的有力竞争对手。
{"title":"Simultaneous Image Reconstruction and Feature Learning with 3D-CNNs for Image Set–Based Classification","authors":"Xinyu Zhang, Xiaocui Li, X.-Y. Jing, Li Cheng","doi":"10.1145/3420037","DOIUrl":"https://doi.org/10.1145/3420037","url":null,"abstract":"Image set–based classification has attracted substantial research interest because of its broad applications. Recently, lots of methods based on feature learning or dictionary learning have been developed to solve this problem, and some of them have made gratifying achievements. However, most of them transform the image set into a 2D matrix or use 2D convolutional neural networks (CNNs) for feature learning, so the spatial and temporal information is missing. At the same time, these methods extract features from original images in which there may exist huge intra-class diversity. To explore a possible solution to these issues, we propose a simultaneous image reconstruction with deep learning and feature learning with 3D-CNNs (SIRFL) for image set classification. The proposed SIRFL approach consists of a deep image reconstruction network and a 3D-CNN-based feature learning network. The deep image reconstruction network is used to reduce the diversity of images from the same set, and the feature learning network can effectively retain spatial and temporal information by using 3D-CNNs. Extensive experimental results on five widely used datasets show that our SIRFL approach is a strong competitor for the state-of-the-art image set classification methods.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3420037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42723418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM/IMS transactions on data science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1