首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
A Unified Framework for Painting Classification 统一的绘画分类框架
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.93
Babak Saleh, A. Elgammal
In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.
在过去的几年里,被数字化并公开的美术收藏品的数量一直在迅速增长。随着大量数字化艺术品的出现,需要开发多媒体系统来存档和检索这些数据池。测量艺术项目之间的视觉相似性是这类多媒体系统的重要步骤,它可以为更高层次的多媒体任务提供帮助。为了对绘画之间的相似性进行建模,我们应该为绘画提取合适的视觉特征,并找到基于这些特征学习相似性度量的最佳方法。我们研究了视觉特征和度量学习方法的综合列表,以学习优化的绘画之间的相似性度量。我们开发了一种能够做出与美学相关的语义级判断的机器,例如预测绘画的风格,流派和艺术家,以及提供基于艺术史解释领域可用知识优化的相似性度量。我们的实验显示了在上述预测任务中使用这种相似性度量的价值。
{"title":"A Unified Framework for Painting Classification","authors":"Babak Saleh, A. Elgammal","doi":"10.1109/ICDMW.2015.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.93","url":null,"abstract":"In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134496671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
OntoSeg: A Novel Approach to Text Segmentation Using Ontological Similarity 本体分割:一种基于本体相似度的文本分割新方法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.6
Mostafa Bayomi, Killian Levacher, M. R. Ghorab, S. Lawless
Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality.
文本分割(TS)的目的是将长文本分割成连贯的片段,这些片段反映了文本的子主题结构。它有利于许多自然语言处理任务,如信息检索(IR)和文档摘要。当前的文本分割方法都是相似的,它们都使用词频度量来度量文本两个区域之间的相似度,从而根据单词之间的词汇衔接来分割文档。各种NLP任务现在正在向语义网和本体(如基于本体的IR系统)转移,以捕获与用户需求和内容相关的概念化。因此,基于词间词汇衔接的文本分割已经不能满足这种任务。本文提出了一种基于文本块间本体相似度的文本分割新方法OntoSeg。该方法使用本体相似性来探索文本段之间的概念关系,并使用层次聚类(HAC)算法将文本表示为概念结构的树状层次结构。所创建树的丰富结构进一步允许在不同粒度级别上以线性方式对文本进行分割。在一个知名的数据集上对该方法进行了评估,结果表明利用本体相似度进行文本分割是很有前途的。将本体相似度与词汇相似度相结合,对该方法进行了改进,结果表明分割质量得到了提高。
{"title":"OntoSeg: A Novel Approach to Text Segmentation Using Ontological Similarity","authors":"Mostafa Bayomi, Killian Levacher, M. R. Ghorab, S. Lawless","doi":"10.1109/ICDMW.2015.6","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.6","url":null,"abstract":"Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130943588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
MERLIN -- A Tool for Multi-party Privacy-Preserving Record Linkage MERLIN——多方隐私保护记录链接的工具
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.101
Thilina Ranbaduge, Dinusha Vatsalan, P. Christen
Many organizations, including businesses, government agencies and research organizations, are collecting vast amounts of data, which are stored, processed and analyzed to mine interesting patterns and knowledge to support efficient and quality decision making. In order to improve data quality and to facilitate further analysis, many application domains require information from multiple sources to be integrated and combined. The process of matching and aggregating records that relate to the same entities from different data sources without compromising their privacy is known as 'privacy-preserving record linkage' (PPRL), 'blind data linkage' or 'private record linkage'. In this paper we present MERLIN, an online tool that demonstrates various PPRL methods in a multi-party context. In this demonstration we show different private multi-party blocking and matching techniques, and illustrate the usability of MERLIN by presenting quality and performance measures of various PPRL methods. We believe MERLIN will help practitioners and researchers to better understand the pipeline of the PPRL process, to compare different multi-party PPRL techniques, and to determine the best technique to use for their needs.
包括企业、政府机构和研究机构在内的许多组织都在收集大量数据,对这些数据进行存储、处理和分析,以挖掘有趣的模式和知识,从而支持高效、高质量的决策制定。为了提高数据质量并促进进一步分析,许多应用程序领域需要集成和组合来自多个源的信息。匹配和聚合来自不同数据源的与同一实体相关的记录而不损害其隐私的过程被称为“隐私保护记录链接”(PPRL)、“盲数据链接”或“私人记录链接”。在本文中,我们介绍了MERLIN,这是一个在线工具,它在多方环境中演示了各种PPRL方法。在这个演示中,我们展示了不同的私有多方阻塞和匹配技术,并通过展示各种PPRL方法的质量和性能度量来说明MERLIN的可用性。我们相信MERLIN将帮助从业者和研究人员更好地理解PPRL过程的管道,比较不同的多方PPRL技术,并确定最适合他们需要的技术。
{"title":"MERLIN -- A Tool for Multi-party Privacy-Preserving Record Linkage","authors":"Thilina Ranbaduge, Dinusha Vatsalan, P. Christen","doi":"10.1109/ICDMW.2015.101","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.101","url":null,"abstract":"Many organizations, including businesses, government agencies and research organizations, are collecting vast amounts of data, which are stored, processed and analyzed to mine interesting patterns and knowledge to support efficient and quality decision making. In order to improve data quality and to facilitate further analysis, many application domains require information from multiple sources to be integrated and combined. The process of matching and aggregating records that relate to the same entities from different data sources without compromising their privacy is known as 'privacy-preserving record linkage' (PPRL), 'blind data linkage' or 'private record linkage'. In this paper we present MERLIN, an online tool that demonstrates various PPRL methods in a multi-party context. In this demonstration we show different private multi-party blocking and matching techniques, and illustrate the usability of MERLIN by presenting quality and performance measures of various PPRL methods. We believe MERLIN will help practitioners and researchers to better understand the pipeline of the PPRL process, to compare different multi-party PPRL techniques, and to determine the best technique to use for their needs.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132533399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter OLLDA: Twitter中有监督的动态主题挖掘框架
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.132
Shatha Jaradat, Nima Dokoohaki, M. Matskin
Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.
实时分析媒体非常重要,因为社交媒体平台是处理、消化和向与这些平台相连的个人传播内容的中心。在这种背景下,主题模型,特别是LDA,由于其可扩展性、推理能力和紧凑的语义而获得了强劲的势头。尽管如此,目前的主题模型在处理动态到达平台的大量数据流方面存在不足,从而影响了它们的解释质量以及对信息过载的适应性。因此,在本文中,我们建议对LDA (OLLDA)进行标记和在线扩展,该扩展通过外部标记和快速消化实时更新的能力进行监督,从而使其更适应Twitter和平台。我们提出的扩展具有处理流中大量新到达的文档的能力,同时能够在tweet文本短且通常草率的情况下实现高主题推断质量。我们的方法主要使用了一种基于变分推理的近似推理技术,并结合了一个标记的LDA模型。最后,我们展示了使用Twitter数据抓取一年的实验,与最先进的基线相比,该实验显示了显著改进的主题推断和时间用户配置文件分类。
{"title":"OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter","authors":"Shatha Jaradat, Nima Dokoohaki, M. Matskin","doi":"10.1109/ICDMW.2015.132","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.132","url":null,"abstract":"Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130256111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Constructing Topic Hierarchies from Social Media Data 从社交媒体数据构建主题层次结构
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.146
Yuhao Zhang, W. Mao, D. Zeng
Constructing topic hierarchies from the data automatically can help us better understand the contents and structure of information and benefit many applications in security informatics. The existing topic hierarchy construction methods either need to specify the structure manually, or are not robust enough for sparse and noisy social media data such as microblog. In this paper, we propose an approach to automatically construct topic hierarchies from microblog data in a bottom up manner. We detect topics first and then build the topic structure based on a tree combination method. We conduct a preliminary empirical study based on the Weibo data. The experimental results show that the topic hierarchies generated by our method provide meaningful results.
从数据中自动构造主题层次结构可以帮助我们更好地理解信息的内容和结构,有利于安全信息学中的许多应用。现有的主题层次结构构建方法要么需要手工指定结构,要么对于微博等稀疏、有噪声的社交媒体数据鲁棒性不够。本文提出了一种基于自底向上的微博数据自动构建主题层次结构的方法。我们首先检测主题,然后基于树组合方法构建主题结构。我们基于微博数据进行了初步的实证研究。实验结果表明,该方法生成的主题层次结构提供了有意义的结果。
{"title":"Constructing Topic Hierarchies from Social Media Data","authors":"Yuhao Zhang, W. Mao, D. Zeng","doi":"10.1109/ICDMW.2015.146","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.146","url":null,"abstract":"Constructing topic hierarchies from the data automatically can help us better understand the contents and structure of information and benefit many applications in security informatics. The existing topic hierarchy construction methods either need to specify the structure manually, or are not robust enough for sparse and noisy social media data such as microblog. In this paper, we propose an approach to automatically construct topic hierarchies from microblog data in a bottom up manner. We detect topics first and then build the topic structure based on a tree combination method. We conduct a preliminary empirical study based on the Weibo data. The experimental results show that the topic hierarchies generated by our method provide meaningful results.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114907214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building a National Perinatal Data Base without the Use of Unique Personal Identifiers 建立一个不使用唯一个人标识符的国家围产期数据库
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.19
R. Schnell, C. Borgs
To assess the quality of hospital care, national databases of standard medical procedures are common. A widely known example are national databases of births. If unique personal identification numbers are available (as in Scandinavian countries), the construction of such databases is trivial from a computational point of view. However, due to privacy legislation, such identifiers are not available in all countries. Given such constraints, the construction of a national perinatal database has to rely on other patient identifiers, such as names and dates of birth. These kind of identifiers are prone to errors. Furthermore, some jurisdictions require the encryption of personal identifiers. The resulting problem is therefore an example of Privacy Preserving Record Linkage (PPRL). This contribution describes the design considerations for a national perinatal database using data of about 600,000 births in about 1,000 hospitals. Based on simulations, recommendations for parameter settings of Bloom filter based PPRL are given for this real world application.
为了评估医院护理的质量,标准医疗程序的国家数据库是常见的。一个广为人知的例子是国家出生数据库。如果可以获得唯一的个人识别号码(如斯堪的纳维亚国家),那么从计算的角度来看,构建这样的数据库是微不足道的。然而,由于隐私立法的原因,并非所有国家都可以使用这种标识符。鉴于这些限制,国家围产期数据库的建设必须依赖于其他患者标识符,如姓名和出生日期。这类标识符容易出错。此外,一些司法管辖区要求对个人标识符进行加密。由此产生的问题是隐私保护记录链接(PPRL)的一个例子。这篇文章描述了使用约1 000家医院约60万名新生儿的数据建立全国围产期数据库的设计考虑。在仿真的基础上,给出了基于布隆滤波器的PPRL的参数设置建议。
{"title":"Building a National Perinatal Data Base without the Use of Unique Personal Identifiers","authors":"R. Schnell, C. Borgs","doi":"10.1109/ICDMW.2015.19","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.19","url":null,"abstract":"To assess the quality of hospital care, national databases of standard medical procedures are common. A widely known example are national databases of births. If unique personal identification numbers are available (as in Scandinavian countries), the construction of such databases is trivial from a computational point of view. However, due to privacy legislation, such identifiers are not available in all countries. Given such constraints, the construction of a national perinatal database has to rely on other patient identifiers, such as names and dates of birth. These kind of identifiers are prone to errors. Furthermore, some jurisdictions require the encryption of personal identifiers. The resulting problem is therefore an example of Privacy Preserving Record Linkage (PPRL). This contribution describes the design considerations for a national perinatal database using data of about 600,000 births in about 1,000 hospitals. Based on simulations, recommendations for parameter settings of Bloom filter based PPRL are given for this real world application.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114937852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Toward Comprehensive Attribution of Healthcare Cost Changes 医疗成本变化的综合归因研究
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.144
Dmitriy A. Katz-Rogozhnikov, Dennis Wei, Gigi Y. Yuen-Reed, K. Ramamurthy, A. Mojsilovic
Health insurance companies wish to understand themain drivers behind changes in their costs to enable targeted and proactive management of their operations. This paper presents a comprehensive approach to cost change attribution that encompasses a range of factors represented in insurance transaction data, including medical procedures, healthcare provider characteristics, patient features, and geographic locations. To allow consideration of such a large number of features and their combinations, we combine feature selection, using regularization and significance testing, with a multiplicative model to account for the nonlinear nature of multi-morbidities. The proposed regression procedure also accommodates real-world aspects of the healthcare domain such as hierarchical relationships among factors and the insurer's differing abilities to address different factors. We describe deployment of the method for a large health insurance company in the United States. Compared to the company's expert analysis on the same dataset, the proposedmethod offers multiple advantages: 1) a unified view of themost significant cost factors across all categories, 2) discovery of smaller-scale anomalous factors missed by the experts, 3) early identification of emerging factors before all claims have been processed, and 4) an efficient automated process that can save months of manual effort.
健康保险公司希望了解其成本变化背后的主要驱动因素,以便对其运营进行有针对性和前瞻性的管理。本文提出了一种综合的成本变化归因方法,该方法涵盖了保险交易数据中表示的一系列因素,包括医疗程序、医疗保健提供者特征、患者特征和地理位置。为了考虑如此大量的特征及其组合,我们使用正则化和显著性检验将特征选择与乘法模型结合起来,以解释多种疾病的非线性性质。所提出的回归过程还适应医疗保健领域的现实方面,例如因素之间的层次关系和保险公司处理不同因素的不同能力。我们描述了该方法在美国一家大型健康保险公司的应用。与该公司对相同数据集的专家分析相比,该方法具有多种优势:1)所有类别中最重要的成本因素的统一视图;2)发现专家遗漏的较小规模异常因素;3)在处理所有索赔之前早期识别新出现的因素;4)高效的自动化流程,可以节省数月的人工工作。
{"title":"Toward Comprehensive Attribution of Healthcare Cost Changes","authors":"Dmitriy A. Katz-Rogozhnikov, Dennis Wei, Gigi Y. Yuen-Reed, K. Ramamurthy, A. Mojsilovic","doi":"10.1109/ICDMW.2015.144","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.144","url":null,"abstract":"Health insurance companies wish to understand themain drivers behind changes in their costs to enable targeted and proactive management of their operations. This paper presents a comprehensive approach to cost change attribution that encompasses a range of factors represented in insurance transaction data, including medical procedures, healthcare provider characteristics, patient features, and geographic locations. To allow consideration of such a large number of features and their combinations, we combine feature selection, using regularization and significance testing, with a multiplicative model to account for the nonlinear nature of multi-morbidities. The proposed regression procedure also accommodates real-world aspects of the healthcare domain such as hierarchical relationships among factors and the insurer's differing abilities to address different factors. We describe deployment of the method for a large health insurance company in the United States. Compared to the company's expert analysis on the same dataset, the proposedmethod offers multiple advantages: 1) a unified view of themost significant cost factors across all categories, 2) discovery of smaller-scale anomalous factors missed by the experts, 3) early identification of emerging factors before all claims have been processed, and 4) an efficient automated process that can save months of manual effort.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114677188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Automatic Pharmacovigilance: Analysing Patient Reviews and Sentiment on Oncological Drugs 迈向自动药物警戒:分析患者对肿瘤药物的评价和看法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.230
Arpita Mishra, A. Malviya, Sanchit Aggarwal
The collection, detection and monitoring of information such as side effects, adverse effects, warnings, precautions of pharmaceutical products is a challenging task. With the advent of user forums, online reviews have become a significant source of information about products. In this work, we aim to utilize pharmaceutical drugs reviews by patients on various health communities to identify frequently occurring issues. We compare these issues with food and drug administration (FDA) approved drug labels for possible improvements. We focus on Oncological drugs and develop a scalable system for mapping of interventions against indication and the respective symptoms from patient comments. Using these mappings, our system is able to compare different sections of FDA labels for recommendations. We use SVM based framework for sentiment analysis to give an overall rating to the drugs. We further incorporate aspect based sentiment analysis for finding the orientation of drug reviews for specific targets.
药品的副作用、不良反应、警告、注意事项等信息的收集、检测和监测是一项具有挑战性的任务。随着用户论坛的出现,在线评论已经成为产品信息的重要来源。在这项工作中,我们的目标是利用不同健康社区患者的药物评论来识别经常发生的问题。我们将这些问题与食品和药物管理局(FDA)批准的药物标签进行比较,以寻求可能的改进。我们专注于肿瘤药物,并开发了一个可扩展的系统,用于针对患者评论的适应症和各自症状的干预制图。使用这些映射,我们的系统能够比较FDA标签的不同部分以获得建议。我们使用基于支持向量机的框架进行情感分析,对药物进行整体评级。我们进一步结合基于方面的情感分析来寻找针对特定目标的药物审查方向。
{"title":"Towards Automatic Pharmacovigilance: Analysing Patient Reviews and Sentiment on Oncological Drugs","authors":"Arpita Mishra, A. Malviya, Sanchit Aggarwal","doi":"10.1109/ICDMW.2015.230","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.230","url":null,"abstract":"The collection, detection and monitoring of information such as side effects, adverse effects, warnings, precautions of pharmaceutical products is a challenging task. With the advent of user forums, online reviews have become a significant source of information about products. In this work, we aim to utilize pharmaceutical drugs reviews by patients on various health communities to identify frequently occurring issues. We compare these issues with food and drug administration (FDA) approved drug labels for possible improvements. We focus on Oncological drugs and develop a scalable system for mapping of interventions against indication and the respective symptoms from patient comments. Using these mappings, our system is able to compare different sections of FDA labels for recommendations. We use SVM based framework for sentiment analysis to give an overall rating to the drugs. We further incorporate aspect based sentiment analysis for finding the orientation of drug reviews for specific targets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Laplacian SVM Based Feature Selection Improves Medical Event Reports Classification 基于拉普拉斯支持向量机的特征选择改进医疗事件报告分类
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.141
S. Fodeh, A. Benin, P. Miller, Kyle Lee, Michele Koss, C. Brandt
Timely reporting and analysis of adverse events and medical errors is critical to driving forward programs in patient-safety, however, due to the large numbers of event reports accumulating daily in health institutions, manually finding and labeling certain types of errors or events is becoming increasingly challenging. We propose to automatically classify/label event reports via semi-supervised learning which utilizes labeled as well as unlabeled event reports to complete the classification task. We focused on classifying two types of event reports: patient mismatches and weight errors. We downloaded 9405 reports from the Connecticut Children's Medical Center reporting system. We generated two samples of labeled and unlabeled reports containing 3155 and 255 for the patient mismatch and the weight error use cases respectively. We developed feature based Laplacian Support Vector machine (FS-LapSVM), a hybrid framework that combines feature selection with Laplacian Support Vector machine classifier (LapSVM). Superior performance of FS-LapSVM in finding patient weight error reports compared to LapSVM. Also, FS-LapSVM classifier outperformed standard LapSVM in classifying patient mismatch reports across all metrics.
及时报告和分析不良事件和医疗错误对于推动患者安全项目至关重要,然而,由于卫生机构每天积累大量事件报告,手动查找和标记某些类型的错误或事件正变得越来越具有挑战性。我们提出通过半监督学习来自动分类/标记事件报告,利用标记和未标记的事件报告来完成分类任务。我们重点对两种类型的事件报告进行分类:患者不匹配和体重错误。我们从康涅狄格儿童医疗中心的报告系统下载了9405份报告。我们为患者不匹配和权重错误用例分别生成了两个包含3155和255的标记和未标记报告样本。我们开发了基于特征的拉普拉斯支持向量机(FS-LapSVM),这是一个将特征选择与拉普拉斯支持向量机分类器(LapSVM)相结合的混合框架。与LapSVM相比,FS-LapSVM在发现患者体重误差报告方面具有优越的性能。此外,FS-LapSVM分类器在对所有指标的患者错配报告进行分类方面优于标准LapSVM。
{"title":"Laplacian SVM Based Feature Selection Improves Medical Event Reports Classification","authors":"S. Fodeh, A. Benin, P. Miller, Kyle Lee, Michele Koss, C. Brandt","doi":"10.1109/ICDMW.2015.141","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.141","url":null,"abstract":"Timely reporting and analysis of adverse events and medical errors is critical to driving forward programs in patient-safety, however, due to the large numbers of event reports accumulating daily in health institutions, manually finding and labeling certain types of errors or events is becoming increasingly challenging. We propose to automatically classify/label event reports via semi-supervised learning which utilizes labeled as well as unlabeled event reports to complete the classification task. We focused on classifying two types of event reports: patient mismatches and weight errors. We downloaded 9405 reports from the Connecticut Children's Medical Center reporting system. We generated two samples of labeled and unlabeled reports containing 3155 and 255 for the patient mismatch and the weight error use cases respectively. We developed feature based Laplacian Support Vector machine (FS-LapSVM), a hybrid framework that combines feature selection with Laplacian Support Vector machine classifier (LapSVM). Superior performance of FS-LapSVM in finding patient weight error reports compared to LapSVM. Also, FS-LapSVM classifier outperformed standard LapSVM in classifying patient mismatch reports across all metrics.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Estimating Contextual Relationships of Stakeholders in Scenarios Using DBpedia 使用DBpedia估算场景中涉众的上下文关系
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.16
Teruaki Hayashi
The expectation on the creation of new business based on the combination of data from different domains, organizations, and sections, has been increased. It is important to consider which stakeholders are involved and how they are involved in the new businesses. However, the combination of stakeholders and their relationships in the scenarios depends on the context and has various patterns, making it difficult to create a reliable business scenario taking account of all the stakeholders in various domains. In this paper, we propose a recommender system of stakeholder to support the generation of scenarios for data utilization. We implemented a system to externalize relevant stakeholders and estimate stakeholders' relationships in the scenarios considering a given context, using DBpedia and scenarios generated in Action Planning as knowledge bases.
基于来自不同领域、组织和部门的数据组合创建新业务的期望有所增加。重要的是要考虑哪些利益相关者参与其中,以及他们如何参与新业务。然而,场景中涉众及其关系的组合依赖于上下文并具有各种模式,因此很难创建考虑到各个领域中所有涉众的可靠业务场景。在本文中,我们提出了一个利益相关者推荐系统,以支持数据利用场景的生成。我们使用DBpedia和Action Planning中生成的场景作为知识库,实现了一个系统,将相关涉众外部化,并在考虑给定上下文的场景中评估涉众之间的关系。
{"title":"Estimating Contextual Relationships of Stakeholders in Scenarios Using DBpedia","authors":"Teruaki Hayashi","doi":"10.1109/ICDMW.2015.16","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.16","url":null,"abstract":"The expectation on the creation of new business based on the combination of data from different domains, organizations, and sections, has been increased. It is important to consider which stakeholders are involved and how they are involved in the new businesses. However, the combination of stakeholders and their relationships in the scenarios depends on the context and has various patterns, making it difficult to create a reliable business scenario taking account of all the stakeholders in various domains. In this paper, we propose a recommender system of stakeholder to support the generation of scenarios for data utilization. We implemented a system to externalize relevant stakeholders and estimate stakeholders' relationships in the scenarios considering a given context, using DBpedia and scenarios generated in Action Planning as knowledge bases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123805939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1