首页 > 最新文献

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献

英文 中文
iMIRACLE: an Iterative Multi-View Graph Neural Network to Model Intercellular Gene Regulation from Spatial Transcriptomic Data. iMIRACLE:从空间转录组数据建立细胞间基因调控模型的迭代多视图图神经网络。
Ziheng Duan, Siwei Xu, Cheyu Lee, Dylan Riffle, Jing Zhang

Spatial transcriptomics has transformed genomic research by measuring spatially resolved gene expressions, allowing us to investigate how cells adapt to their microenvironment via modulating their expressed genes. This essential process usually starts from cell-cell communication (CCC) via ligand-receptor (LR) interaction, leading to regulatory changes within the receiver cell. However, few methods were developed to connect them to provide biological insights into intercellular regulation. To fill this gap, we propose iMiracle, an iterative multi-view graph neural network that models each cell's intercellular regulation with three key features. Firstly, iMiracle integrates inter- and intra-cellular networks to jointly estimate cell-type- and micro-environment-driven gene expressions. Optionally, it allows prior knowledge of intra-cellular networks as pre-structured masks to maintain biological relevance. Secondly, iMiracle employs iterative learning to overcome the sparsity of spatial transcriptomic data and gradually fill in the missing edges in the CCC network. Thirdly, iMiracle infers a cell-specific ligand-gene regulatory score based on the contributions of different LR pairs to interpret inter-cellular regulation. We applied iMiracle to nine simulated and eight real datasets from three sequencing platforms and demonstrated that iMiracle consistently outperformed ten methods in gene expression imputation and four methods in regulatory score inference. Lastly, we developed iMiracle as an open-source software and anticipate that it can be a powerful tool in decoding the complexities of inter-cellular transcriptional regulation.

空间转录组学通过测量空间分辨的基因表达改变了基因组研究,使我们能够研究细胞如何通过调节其表达的基因来适应其微环境。这一重要过程通常从细胞间通信(CCC)开始,通过配体-受体(LR)相互作用,导致受体细胞内的调节变化。然而,很少有方法将它们联系起来,以提供对细胞间调节的生物学见解。为了填补这一空白,我们提出了iMiracle,这是一个迭代的多视图神经网络,它通过三个关键特征来模拟每个细胞的细胞间调节。首先,iMiracle集成了细胞间和细胞内网络,共同估计细胞类型和微环境驱动的基因表达。可选地,它允许细胞内网络的先验知识作为预结构掩模,以保持生物学相关性。其次,iMiracle采用迭代学习克服空间转录组数据的稀疏性,逐步填补CCC网络中缺失的边缘。第三,iMiracle根据不同LR对的贡献推断出细胞特异性配体-基因调控评分,以解释细胞间调控。我们将iMiracle应用于来自三个测序平台的9个模拟数据集和8个真实数据集,并证明iMiracle在基因表达imputation方面始终优于10种方法,在调控评分推断方面优于4种方法。最后,我们开发了iMiracle作为开源软件,并预计它可以成为解码细胞间转录调控复杂性的强大工具。
{"title":"iMIRACLE: an Iterative Multi-View Graph Neural Network to Model Intercellular Gene Regulation from Spatial Transcriptomic Data.","authors":"Ziheng Duan, Siwei Xu, Cheyu Lee, Dylan Riffle, Jing Zhang","doi":"10.1145/3627673.3679574","DOIUrl":"10.1145/3627673.3679574","url":null,"abstract":"<p><p>Spatial transcriptomics has transformed genomic research by measuring spatially resolved gene expressions, allowing us to investigate how cells adapt to their microenvironment via modulating their expressed genes. This essential process usually starts from cell-cell communication (CCC) via ligand-receptor (LR) interaction, leading to regulatory changes within the receiver cell. However, few methods were developed to connect them to provide biological insights into intercellular regulation. To fill this gap, we propose iMiracle, an iterative multi-view graph neural network that models each cell's intercellular regulation with three key features. Firstly, iMiracle integrates inter- and intra-cellular networks to jointly estimate <i>cell-type</i>- and <i>micro-environment</i>-driven gene expressions. Optionally, it allows prior knowledge of intra-cellular networks as pre-structured masks to maintain biological relevance. Secondly, iMiracle employs iterative learning to overcome the sparsity of spatial transcriptomic data and gradually fill in the missing edges in the CCC network. Thirdly, iMiracle infers a cell-specific ligand-gene regulatory score based on the contributions of different LR pairs to interpret inter-cellular regulation. We applied iMiracle to nine simulated and eight real datasets from three sequencing platforms and demonstrated that iMiracle consistently outperformed ten methods in gene expression imputation and four methods in regulatory score inference. Lastly, we developed iMiracle as an open-source software and anticipate that it can be a powerful tool in decoding the complexities of inter-cellular transcriptional regulation.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"538-548"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scACT: Accurate Cross-modality Translation via Cycle-consistent Training from Unpaired Single-cell Data. 摘要:从未配对的单细胞数据中通过循环一致训练进行准确的跨模态翻译。
Siwei Xu, Junhao Liu, Jing Zhang

Single-cell sequencing technologies have revolutionized genomics by enabling the simultaneous profiling of various molecular modalities within individual cells. Their integration, especially cross-modality translation, offers deep insights into cellular regulatory mechanisms. Many methods have been developed for cross-modality translation, but their reliance on scarce high-quality co-assay data limits their applicability. Addressing this, we introduce scACT, a deep generative model designed to extract cross-modality biological insights from unpaired single-cell data. scACT tackles three major challenges: aligning unpaired multi-modal data via adversarial training, facilitating cross-modality translation without prior knowledge via cycle-consistent training, and enabling interpretable regulatory interconnections explorations via in-silico perturbations. To test its performance, we applied scACT on diverse single-cell datasets and found it outperformed existing methods in all three tasks. Finally, we have developed scACT as an individual open-source software package to advance single-cell omics data processing and analysis within the research community.

单细胞测序技术通过能够同时分析单个细胞内的各种分子模式,彻底改变了基因组学。它们的整合,特别是跨模态翻译,提供了对细胞调控机制的深刻见解。已经开发了许多跨模态翻译方法,但它们对稀缺的高质量联合分析数据的依赖限制了它们的适用性。为了解决这个问题,我们引入了scACT,这是一个深度生成模型,旨在从未配对的单细胞数据中提取跨模态的生物学见解。scACT解决了三个主要挑战:通过对抗性训练对齐未配对的多模态数据,通过循环一致训练在没有先验知识的情况下促进跨模态翻译,以及通过计算机扰动实现可解释的调节互连探索。为了测试其性能,我们将scACT应用于不同的单细胞数据集,发现它在所有三个任务中都优于现有的方法。最后,我们开发了scACT作为一个独立的开源软件包,以促进研究社区内单细胞组学数据的处理和分析。
{"title":"scACT: Accurate Cross-modality Translation via Cycle-consistent Training from Unpaired Single-cell Data.","authors":"Siwei Xu, Junhao Liu, Jing Zhang","doi":"10.1145/3627673.3679576","DOIUrl":"10.1145/3627673.3679576","url":null,"abstract":"<p><p>Single-cell sequencing technologies have revolutionized genomics by enabling the simultaneous profiling of various molecular modalities within individual cells. Their integration, especially cross-modality translation, offers deep insights into cellular regulatory mechanisms. Many methods have been developed for cross-modality translation, but their reliance on scarce high-quality co-assay data limits their applicability. Addressing this, we introduce scACT, a deep generative model designed to extract cross-modality biological insights from unpaired single-cell data. scACT tackles three major challenges: aligning unpaired multi-modal data via adversarial training, facilitating cross-modality translation without prior knowledge via cycle-consistent training, and enabling interpretable regulatory interconnections explorations via in-silico perturbations. To test its performance, we applied scACT on diverse single-cell datasets and found it outperformed existing methods in all three tasks. Finally, we have developed scACT as an individual open-source software package to advance single-cell omics data processing and analysis within the research community.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"2722-2731"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11611688/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HypMix: Hyperbolic Representation Learning for Graphs with Mixed Hierarchical and Non-hierarchical Structures.
Eric W Lee, Bo Xiong, Carl Yang, Joyce C Ho

Heterogeneous networks contain multiple types of nodes and links, with some link types encapsulating hierarchical structure over entities. Hierarchical relationships can codify information such as subcategories or one entity being subsumed by another and are often used for organizing conceptual knowledge into a tree-structured graph. Hyperbolic embedding models learn node representations in a hyperbolic space suitable for preserving the hierarchical structure. Unfortunately, current hyperbolic embedding models only implicitly capture the hierarchical structure, failing to distinguish between node types, and they only assume a single tree. In practice, many networks contain a mixture of hierarchical and non-hierarchical structures, and the hierarchical relations may be represented as multiple trees with complex structures, such as sharing certain entities. In this work, we propose a new hyperbolic representation learning model that can handle complex hierarchical structures and also learn the representation of both hierarchical and non-hierarchic structures. We evaluate our model on several datasets, including identifying relevant articles for a systematic review, which is an essential tool for evidence-driven medicine and node classification.

{"title":"HypMix: Hyperbolic Representation Learning for Graphs with Mixed Hierarchical and Non-hierarchical Structures.","authors":"Eric W Lee, Bo Xiong, Carl Yang, Joyce C Ho","doi":"10.1145/3627673.3679940","DOIUrl":"10.1145/3627673.3679940","url":null,"abstract":"<p><p>Heterogeneous networks contain multiple types of nodes and links, with some link types encapsulating hierarchical structure over entities. Hierarchical relationships can codify information such as subcategories or one entity being subsumed by another and are often used for organizing conceptual knowledge into a tree-structured graph. Hyperbolic embedding models learn node representations in a hyperbolic space suitable for preserving the hierarchical structure. Unfortunately, current hyperbolic embedding models only implicitly capture the hierarchical structure, failing to distinguish between node types, and they only assume a single tree. In practice, many networks contain a mixture of hierarchical and non-hierarchical structures, and the hierarchical relations may be represented as multiple trees with complex structures, such as sharing certain entities. In this work, we propose a new hyperbolic representation learning model that can handle complex hierarchical structures and also learn the representation of both hierarchical and non-hierarchic structures. We evaluate our model on several datasets, including identifying relevant articles for a systematic review, which is an essential tool for evidence-driven medicine and node classification.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"3852-3856"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11867734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation.
Baoyu Jing, Dawei Zhou, Kan Ren, Carl Yang

Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel Causality-Aware Spatiotemporal Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover the causal relationships.

{"title":"Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation.","authors":"Baoyu Jing, Dawei Zhou, Kan Ren, Carl Yang","doi":"10.1145/3627673.3679642","DOIUrl":"10.1145/3627673.3679642","url":null,"abstract":"<p><p>Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel Causality-Aware Spatiotemporal Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover the causal relationships.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"1027-1037"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Node Classification over Distributed Ego-Networks with Secure Contrastive Embedding Sharing. 基于安全对比嵌入共享的分布式自我网络的联邦节点分类。
Han Xie, Li Xiong, Carl Yang

Federated learning on graphs (a.k.a., federated graph learning- FGL) has recently received increasing attention due to its capacity to enable collaborative learning over distributed graph datasets without compromising local clients' data privacy. In previous works, clients of FGL typically represent institutes or organizations that possess sets of entire graphs (e.g., molecule graphs in biochemical research) or parts of a larger graph (e.g., sub-user networks of e-commerce platforms). However, another natural paradigm exists where clients act as remote devices retaining the graph structures of local neighborhoods centered around the device owners (i.e., ego-networks), which can be modeled for specific graph applications such as user profiling on social ego-networks and infection prediction on contact ego-networks. FGL in such novel yet realistic ego-network settings faces the unique challenge of incomplete neighborhood information for non-ego local nodes since they likely appear and have different sets of neighbors in multiple ego-networks. To address this challenge, we propose an FGL method for distributed ego-networks in which clients obtain complete neighborhood information of local nodes through sharing node embeddings with other clients. A contrastive learning mechanism is proposed to bridge the gap between local and global node embeddings and stabilize the local training of graph neural network models, while a secure embedding sharing protocol is employed to protect individual node identity and embedding privacy against the server and other clients. Comprehensive experiments on various distributed ego-network datasets successfully demonstrate the effectiveness of our proposed embedding sharing method on top of different federated model sharing frameworks, and we also provide discussions on the potential efficiency and privacy drawbacks of the method as well as their future mitigation.

图上的联邦学习(又名联邦图学习- FGL)最近受到越来越多的关注,因为它能够在不损害本地客户端的数据隐私的情况下,在分布式图数据集上进行协作学习。在之前的工作中,FGL的客户通常代表拥有完整图集(如生化研究中的分子图)或更大图的部分(如电子商务平台的子用户网络)的机构或组织。然而,存在另一种自然范例,即客户端充当远程设备,保留以设备所有者(即自我网络)为中心的本地社区的图形结构,这可以为特定的图形应用程序建模,例如社交自我网络上的用户分析和接触自我网络上的感染预测。在这种新颖而现实的自我网络环境下,FGL面临着非自我局部节点邻居信息不完整的独特挑战,因为它们可能在多个自我网络中出现并拥有不同的邻居集。为了解决这一挑战,我们提出了一种用于分布式自我网络的FGL方法,其中客户端通过与其他客户端共享节点嵌入来获取本地节点的完整邻域信息。提出了一种对比学习机制来弥合局部和全局节点嵌入之间的差距,稳定图神经网络模型的局部训练,同时采用安全嵌入共享协议来保护单个节点的身份和嵌入隐私不受服务器和其他客户端的影响。在各种分布式自我网络数据集上的综合实验成功地证明了我们提出的嵌入共享方法在不同联邦模型共享框架之上的有效性,我们还讨论了该方法的潜在效率和隐私缺陷以及未来的缓解措施。
{"title":"Federated Node Classification over Distributed Ego-Networks with Secure Contrastive Embedding Sharing.","authors":"Han Xie, Li Xiong, Carl Yang","doi":"10.1145/3627673.3679834","DOIUrl":"https://doi.org/10.1145/3627673.3679834","url":null,"abstract":"<p><p>Federated learning on graphs (a.k.a., federated graph learning- FGL) has recently received increasing attention due to its capacity to enable collaborative learning over distributed graph datasets without compromising local clients' data privacy. In previous works, clients of FGL typically represent institutes or organizations that possess sets of entire graphs (e.g., molecule graphs in biochemical research) or parts of a larger graph (e.g., sub-user networks of e-commerce platforms). However, another natural paradigm exists where clients act as remote devices retaining the graph structures of local neighborhoods centered around the device owners (i.e., ego-networks), which can be modeled for specific graph applications such as user profiling on social ego-networks and infection prediction on contact ego-networks. FGL in such novel yet realistic ego-network settings faces the unique challenge of incomplete neighborhood information for non-ego local nodes since they likely appear and have different sets of neighbors in multiple ego-networks. To address this challenge, we propose an FGL method for distributed ego-networks in which clients obtain complete neighborhood information of local nodes through sharing node embeddings with other clients. A contrastive learning mechanism is proposed to bridge the gap between local and global node embeddings and stabilize the local training of graph neural network models, while a secure embedding sharing protocol is employed to protect individual node identity and embedding privacy against the server and other clients. Comprehensive experiments on various distributed ego-network datasets successfully demonstrate the effectiveness of our proposed embedding sharing method on top of different federated model sharing frameworks, and we also provide discussions on the potential efficiency and privacy drawbacks of the method as well as their future mitigation.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"2607-2617"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Health Data Sharing with Fine-Grained Privacy. 以细粒度隐私实现健康数据共享。
Luca Bonomi, Sepand Gousheh, Liyue Fan

Sharing health data is vital in advancing medical research and transforming knowledge into clinical practice. Meanwhile, protecting the privacy of data contributors is of paramount importance. To that end, several privacy approaches have been proposed to protect individual data contributors in data sharing, including data anonymization and data synthesis techniques. These approaches have shown promising results in providing privacy protection at the dataset level. In this work, we study the privacy challenges in enabling fine-grained privacy in health data sharing. Our work is motivated by recent research findings, in which patients and healthcare providers may have different privacy preferences and policies that need to be addressed. Specifically, we propose a novel and effective privacy solution that enables data curators (e.g., healthcare providers) to protect sensitive data elements while preserving data usefulness. Our solution builds on randomized techniques to provide rigorous privacy protection for sensitive elements and leverages graphical models to mitigate privacy leakage due to dependent elements. To enhance the usefulness of the shared data, our randomized mechanism incorporates domain knowledge to preserve semantic similarity and adopts a block-structured design to minimize utility loss. Evaluations with real-world health data demonstrate the effectiveness of our approach and the usefulness of the shared data for health applications.

共享健康数据对于推进医学研究和将知识转化为临床实践至关重要。同时,保护数据贡献者的隐私至关重要。为此,已经提出了几种隐私方法来保护数据共享中的个人数据贡献者,包括数据匿名化和数据合成技术。这些方法在数据集级别提供隐私保护方面显示出了有希望的结果。在这项工作中,我们研究了在健康数据共享中实现细粒度隐私的隐私挑战。我们的工作是由最近的研究结果推动的,在这些研究结果中,患者和医疗保健提供者可能有不同的隐私偏好和需要解决的政策。具体而言,我们提出了一种新颖有效的隐私解决方案,使数据管理者(如医疗保健提供者)能够在保持数据有用性的同时保护敏感数据元素。我们的解决方案建立在随机技术的基础上,为敏感元素提供严格的隐私保护,并利用图形模型来减少因依赖元素而导致的隐私泄露。为了增强共享数据的有用性,我们的随机化机制结合了领域知识来保持语义相似性,并采用块结构设计来最大限度地减少效用损失。对真实世界健康数据的评估表明了我们方法的有效性以及共享数据对健康应用的有用性。
{"title":"Enabling Health Data Sharing with Fine-Grained Privacy.","authors":"Luca Bonomi, Sepand Gousheh, Liyue Fan","doi":"10.1145/3583780.3614864","DOIUrl":"10.1145/3583780.3614864","url":null,"abstract":"<p><p>Sharing health data is vital in advancing medical research and transforming knowledge into clinical practice. Meanwhile, protecting the privacy of data contributors is of paramount importance. To that end, several privacy approaches have been proposed to protect individual data contributors in data sharing, including data anonymization and data synthesis techniques. These approaches have shown promising results in providing privacy protection at the dataset level. In this work, we study the privacy challenges in enabling fine-grained privacy in health data sharing. Our work is motivated by recent research findings, in which patients and healthcare providers may have different privacy preferences and policies that need to be addressed. Specifically, we propose a novel and effective privacy solution that enables data curators (e.g., healthcare providers) to protect sensitive data elements while preserving data usefulness. Our solution builds on randomized techniques to provide rigorous privacy protection for sensitive elements and leverages graphical models to mitigate privacy leakage due to dependent elements. To enhance the usefulness of the shared data, our randomized mechanism incorporates domain knowledge to preserve semantic similarity and adopts a block-structured design to minimize utility loss. Evaluations with real-world health data demonstrate the effectiveness of our approach and the usefulness of the shared data for health applications.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2023 ","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71429999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedCV: An Interactive Visualization System for Patient Cohort Identification from Medical Claim Data. MedCV:从医疗索赔数据中识别患者队列的交互式可视化系统。
Ashis Kumar Chanda, Tian Bai, Brian L Egleston, Slobodan Vucetic

Healthcare providers generate a medical claim after every patient visit. A medical claim consists of a list of medical codes describing the diagnosis and any treatment provided during the visit. Medical claims have been popular in medical research as a data source for retrospective cohort studies. This paper introduces a medical claim visualization system (MedCV) that supports cohort selection from medical claim data. MedCV was developed as part of a design study in collaboration with clinical researchers and statisticians. It helps a researcher to define inclusion rules for cohort selection by revealing relationships between medical codes and visualizing medical claims and patient timelines. Evaluation of our system through a user study indicates that MedCV enables domain experts to define high-quality inclusion rules in a time-efficient manner.

医疗保健提供者在每次患者就诊后生成医疗索赔。医疗索赔包括描述诊断和就诊期间提供的任何治疗的医疗代码列表。医学索赔作为回顾性队列研究的数据来源在医学研究中很受欢迎。本文介绍了一个医疗索赔可视化系统(MedCV),该系统支持从医疗索赔数据中进行队列选择。MedCV是与临床研究人员和统计学家合作开发的设计研究的一部分。它通过揭示医疗代码之间的关系以及可视化医疗索赔和患者时间表,帮助研究人员定义队列选择的纳入规则。通过用户研究对我们的系统进行的评估表明,MedCV使领域专家能够以高效的方式定义高质量的包含规则。
{"title":"MedCV: An Interactive Visualization System for Patient Cohort Identification from Medical Claim Data.","authors":"Ashis Kumar Chanda,&nbsp;Tian Bai,&nbsp;Brian L Egleston,&nbsp;Slobodan Vucetic","doi":"10.1145/3511808.3557157","DOIUrl":"10.1145/3511808.3557157","url":null,"abstract":"<p><p>Healthcare providers generate a medical claim after every patient visit. A medical claim consists of a list of medical codes describing the diagnosis and any treatment provided during the visit. Medical claims have been popular in medical research as a data source for retrospective cohort studies. This paper introduces a medical claim visualization system (MedCV) that supports cohort selection from medical claim data. MedCV was developed as part of a design study in collaboration with clinical researchers and statisticians. It helps a researcher to define inclusion rules for cohort selection by revealing relationships between medical codes and visualizing medical claims and patient timelines. Evaluation of our system through a user study indicates that MedCV enables domain experts to define high-quality inclusion rules in a time-efficient manner.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2022 ","pages":"4828-4832"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9098325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark. PubMed作者指定关键字提取(PubMedAKE)基准。
Jiasheng Sheng, Zelalem Gero, Joyce C Ho

With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.

随着生物医学论文的不断丰富,提高关键词搜索结果的准确性对于确保研究的可重复性至关重要。然而,由于存在较为模糊的关键词和缺乏全面的基准,生物医学论文的关键词提取非常困难。PubMedAKE是一个作者指定的关键字提取数据集,其中包含来自PubMed开放存取子集数据库的超过843,269篇文章的标题、摘要和关键字。这个数据集在Zenodo上公开可用,是最大的关键字提取基准,有足够的样本来训练神经网络。使用最先进的基线方法的实验结果说明了开发生物医学文献自动关键字提取方法的必要性。
{"title":"PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.","authors":"Jiasheng Sheng,&nbsp;Zelalem Gero,&nbsp;Joyce C Ho","doi":"10.1145/3511808.3557675","DOIUrl":"https://doi.org/10.1145/3511808.3557675","url":null,"abstract":"<p><p>With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"4470-4474"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652778/pdf/nihms-1846241.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40687330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
From Product Searches to Conversational Agents for E-Commerce 从产品搜索到电子商务会话代理
G. D. Fabbrizio
{"title":"From Product Searches to Conversational Agents for E-Commerce","authors":"G. D. Fabbrizio","doi":"10.1145/3511808.3557514","DOIUrl":"https://doi.org/10.1145/3511808.3557514","url":null,"abstract":"","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"129 1","pages":"5085"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73665054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Visual Accessibility Assessment of Videos. 视频的非视觉无障碍评估。
Ali Selman Aydin, Yu-Jung Ko, Utku Uckun, I V Ramakrishnan, Vikas Ashok

Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility score based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that signifcantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an F 1 score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.

随着在线视频在教育、就业和娱乐中发挥越来越重要的作用,视频的可访问性对盲人屏幕阅读器用户来说至关重要。虽然有相当多的技术和指导方针专注于创建可访问的视频,但缺乏试图描述现有视频可访问性的研究。因此,在本文中,我们定义和研究了一组不同的基于视频和音频的可访问性特征,以努力表征可访问和不可访问的视频。作为我们调查的基本事实,我们建立了一个包含600个视频的自定义数据集,其中每个视频根据其在瑞士系统锦标赛中的获胜次数被分配一个可访问性分数,其中人类注释者对视频进行两两可访问性比较。现有的可访问性研究通常由盲人用户进行评估,与此相反,我们招募了有视力的用户,因为视频包含一个特殊情况,可以要求视力更好地判断视频中的任何特定场景目前是否可访问。随后,通过检查可访问性特征与可访问性得分之间的关联程度,我们可以确定显著(积极或消极)影响视频可访问性的特征,从而作为评估视频可访问性的良好指标。使用自定义数据集,我们还训练了机器学习模型,该模型利用我们手工制作的特征将任意视频分类为可访问/不可访问或预测视频的可访问性分数。对我们的模型进行评估,二元分类的f1得分为0.675,分数预测的平均绝对误差为0.53,从而显示了它们在视频可访问性评估中的潜力,同时也说明了它们目前的局限性以及在该领域进一步研究的必要性。
{"title":"Non-Visual Accessibility Assessment of Videos.","authors":"Ali Selman Aydin,&nbsp;Yu-Jung Ko,&nbsp;Utku Uckun,&nbsp;I V Ramakrishnan,&nbsp;Vikas Ashok","doi":"10.1145/3459637.3482457","DOIUrl":"https://doi.org/10.1145/3459637.3482457","url":null,"abstract":"<p><p>Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility <i>score</i> based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that signifcantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an <i>F</i> <sub>1</sub> score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2021 ","pages":"58-67"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845074/pdf/nihms-1777380.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39931156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1