In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.
{"title":"A Unified Framework for Painting Classification","authors":"Babak Saleh, A. Elgammal","doi":"10.1109/ICDMW.2015.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.93","url":null,"abstract":"In the past few years, the number of fine-art collections that are digitized and publicly available has been growing rapidly. With the availability of such large collections of digitized artworks comes the need to develop multimedia systems to archive and retrieve this pool of data. Measuring the visual similarity between artistic items is an essential step for such multimedia systems, which can benefit more high-level multimedia tasks. In order to model this similarity between paintings, we should extract the appropriate visual features for paintings and find out the best approach to learn the similarity metric based on these features. We investigate a comprehensive list of visual features and metric learning approaches to learn an optimized similarity measure between paintings. We develop a machine that is able to make aesthetic-related semantic-level judgments, such as predicting a painting's style, genre, and artist, as well as providing similarity measures optimized based on the knowledge available in the domain of art historical interpretation. Our experiments show the value of using this similarity measure for the aforementioned prediction tasks.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134496671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mostafa Bayomi, Killian Levacher, M. R. Ghorab, S. Lawless
Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality.
{"title":"OntoSeg: A Novel Approach to Text Segmentation Using Ontological Similarity","authors":"Mostafa Bayomi, Killian Levacher, M. R. Ghorab, S. Lawless","doi":"10.1109/ICDMW.2015.6","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.6","url":null,"abstract":"Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130943588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many organizations, including businesses, government agencies and research organizations, are collecting vast amounts of data, which are stored, processed and analyzed to mine interesting patterns and knowledge to support efficient and quality decision making. In order to improve data quality and to facilitate further analysis, many application domains require information from multiple sources to be integrated and combined. The process of matching and aggregating records that relate to the same entities from different data sources without compromising their privacy is known as 'privacy-preserving record linkage' (PPRL), 'blind data linkage' or 'private record linkage'. In this paper we present MERLIN, an online tool that demonstrates various PPRL methods in a multi-party context. In this demonstration we show different private multi-party blocking and matching techniques, and illustrate the usability of MERLIN by presenting quality and performance measures of various PPRL methods. We believe MERLIN will help practitioners and researchers to better understand the pipeline of the PPRL process, to compare different multi-party PPRL techniques, and to determine the best technique to use for their needs.
{"title":"MERLIN -- A Tool for Multi-party Privacy-Preserving Record Linkage","authors":"Thilina Ranbaduge, Dinusha Vatsalan, P. Christen","doi":"10.1109/ICDMW.2015.101","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.101","url":null,"abstract":"Many organizations, including businesses, government agencies and research organizations, are collecting vast amounts of data, which are stored, processed and analyzed to mine interesting patterns and knowledge to support efficient and quality decision making. In order to improve data quality and to facilitate further analysis, many application domains require information from multiple sources to be integrated and combined. The process of matching and aggregating records that relate to the same entities from different data sources without compromising their privacy is known as 'privacy-preserving record linkage' (PPRL), 'blind data linkage' or 'private record linkage'. In this paper we present MERLIN, an online tool that demonstrates various PPRL methods in a multi-party context. In this demonstration we show different private multi-party blocking and matching techniques, and illustrate the usability of MERLIN by presenting quality and performance measures of various PPRL methods. We believe MERLIN will help practitioners and researchers to better understand the pipeline of the PPRL process, to compare different multi-party PPRL techniques, and to determine the best technique to use for their needs.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132533399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.
{"title":"OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter","authors":"Shatha Jaradat, Nima Dokoohaki, M. Matskin","doi":"10.1109/ICDMW.2015.132","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.132","url":null,"abstract":"Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130256111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Constructing topic hierarchies from the data automatically can help us better understand the contents and structure of information and benefit many applications in security informatics. The existing topic hierarchy construction methods either need to specify the structure manually, or are not robust enough for sparse and noisy social media data such as microblog. In this paper, we propose an approach to automatically construct topic hierarchies from microblog data in a bottom up manner. We detect topics first and then build the topic structure based on a tree combination method. We conduct a preliminary empirical study based on the Weibo data. The experimental results show that the topic hierarchies generated by our method provide meaningful results.
{"title":"Constructing Topic Hierarchies from Social Media Data","authors":"Yuhao Zhang, W. Mao, D. Zeng","doi":"10.1109/ICDMW.2015.146","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.146","url":null,"abstract":"Constructing topic hierarchies from the data automatically can help us better understand the contents and structure of information and benefit many applications in security informatics. The existing topic hierarchy construction methods either need to specify the structure manually, or are not robust enough for sparse and noisy social media data such as microblog. In this paper, we propose an approach to automatically construct topic hierarchies from microblog data in a bottom up manner. We detect topics first and then build the topic structure based on a tree combination method. We conduct a preliminary empirical study based on the Weibo data. The experimental results show that the topic hierarchies generated by our method provide meaningful results.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114907214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To assess the quality of hospital care, national databases of standard medical procedures are common. A widely known example are national databases of births. If unique personal identification numbers are available (as in Scandinavian countries), the construction of such databases is trivial from a computational point of view. However, due to privacy legislation, such identifiers are not available in all countries. Given such constraints, the construction of a national perinatal database has to rely on other patient identifiers, such as names and dates of birth. These kind of identifiers are prone to errors. Furthermore, some jurisdictions require the encryption of personal identifiers. The resulting problem is therefore an example of Privacy Preserving Record Linkage (PPRL). This contribution describes the design considerations for a national perinatal database using data of about 600,000 births in about 1,000 hospitals. Based on simulations, recommendations for parameter settings of Bloom filter based PPRL are given for this real world application.
{"title":"Building a National Perinatal Data Base without the Use of Unique Personal Identifiers","authors":"R. Schnell, C. Borgs","doi":"10.1109/ICDMW.2015.19","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.19","url":null,"abstract":"To assess the quality of hospital care, national databases of standard medical procedures are common. A widely known example are national databases of births. If unique personal identification numbers are available (as in Scandinavian countries), the construction of such databases is trivial from a computational point of view. However, due to privacy legislation, such identifiers are not available in all countries. Given such constraints, the construction of a national perinatal database has to rely on other patient identifiers, such as names and dates of birth. These kind of identifiers are prone to errors. Furthermore, some jurisdictions require the encryption of personal identifiers. The resulting problem is therefore an example of Privacy Preserving Record Linkage (PPRL). This contribution describes the design considerations for a national perinatal database using data of about 600,000 births in about 1,000 hospitals. Based on simulations, recommendations for parameter settings of Bloom filter based PPRL are given for this real world application.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114937852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitriy A. Katz-Rogozhnikov, Dennis Wei, Gigi Y. Yuen-Reed, K. Ramamurthy, A. Mojsilovic
Health insurance companies wish to understand themain drivers behind changes in their costs to enable targeted and proactive management of their operations. This paper presents a comprehensive approach to cost change attribution that encompasses a range of factors represented in insurance transaction data, including medical procedures, healthcare provider characteristics, patient features, and geographic locations. To allow consideration of such a large number of features and their combinations, we combine feature selection, using regularization and significance testing, with a multiplicative model to account for the nonlinear nature of multi-morbidities. The proposed regression procedure also accommodates real-world aspects of the healthcare domain such as hierarchical relationships among factors and the insurer's differing abilities to address different factors. We describe deployment of the method for a large health insurance company in the United States. Compared to the company's expert analysis on the same dataset, the proposedmethod offers multiple advantages: 1) a unified view of themost significant cost factors across all categories, 2) discovery of smaller-scale anomalous factors missed by the experts, 3) early identification of emerging factors before all claims have been processed, and 4) an efficient automated process that can save months of manual effort.
{"title":"Toward Comprehensive Attribution of Healthcare Cost Changes","authors":"Dmitriy A. Katz-Rogozhnikov, Dennis Wei, Gigi Y. Yuen-Reed, K. Ramamurthy, A. Mojsilovic","doi":"10.1109/ICDMW.2015.144","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.144","url":null,"abstract":"Health insurance companies wish to understand themain drivers behind changes in their costs to enable targeted and proactive management of their operations. This paper presents a comprehensive approach to cost change attribution that encompasses a range of factors represented in insurance transaction data, including medical procedures, healthcare provider characteristics, patient features, and geographic locations. To allow consideration of such a large number of features and their combinations, we combine feature selection, using regularization and significance testing, with a multiplicative model to account for the nonlinear nature of multi-morbidities. The proposed regression procedure also accommodates real-world aspects of the healthcare domain such as hierarchical relationships among factors and the insurer's differing abilities to address different factors. We describe deployment of the method for a large health insurance company in the United States. Compared to the company's expert analysis on the same dataset, the proposedmethod offers multiple advantages: 1) a unified view of themost significant cost factors across all categories, 2) discovery of smaller-scale anomalous factors missed by the experts, 3) early identification of emerging factors before all claims have been processed, and 4) an efficient automated process that can save months of manual effort.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114677188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The collection, detection and monitoring of information such as side effects, adverse effects, warnings, precautions of pharmaceutical products is a challenging task. With the advent of user forums, online reviews have become a significant source of information about products. In this work, we aim to utilize pharmaceutical drugs reviews by patients on various health communities to identify frequently occurring issues. We compare these issues with food and drug administration (FDA) approved drug labels for possible improvements. We focus on Oncological drugs and develop a scalable system for mapping of interventions against indication and the respective symptoms from patient comments. Using these mappings, our system is able to compare different sections of FDA labels for recommendations. We use SVM based framework for sentiment analysis to give an overall rating to the drugs. We further incorporate aspect based sentiment analysis for finding the orientation of drug reviews for specific targets.
{"title":"Towards Automatic Pharmacovigilance: Analysing Patient Reviews and Sentiment on Oncological Drugs","authors":"Arpita Mishra, A. Malviya, Sanchit Aggarwal","doi":"10.1109/ICDMW.2015.230","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.230","url":null,"abstract":"The collection, detection and monitoring of information such as side effects, adverse effects, warnings, precautions of pharmaceutical products is a challenging task. With the advent of user forums, online reviews have become a significant source of information about products. In this work, we aim to utilize pharmaceutical drugs reviews by patients on various health communities to identify frequently occurring issues. We compare these issues with food and drug administration (FDA) approved drug labels for possible improvements. We focus on Oncological drugs and develop a scalable system for mapping of interventions against indication and the respective symptoms from patient comments. Using these mappings, our system is able to compare different sections of FDA labels for recommendations. We use SVM based framework for sentiment analysis to give an overall rating to the drugs. We further incorporate aspect based sentiment analysis for finding the orientation of drug reviews for specific targets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Fodeh, A. Benin, P. Miller, Kyle Lee, Michele Koss, C. Brandt
Timely reporting and analysis of adverse events and medical errors is critical to driving forward programs in patient-safety, however, due to the large numbers of event reports accumulating daily in health institutions, manually finding and labeling certain types of errors or events is becoming increasingly challenging. We propose to automatically classify/label event reports via semi-supervised learning which utilizes labeled as well as unlabeled event reports to complete the classification task. We focused on classifying two types of event reports: patient mismatches and weight errors. We downloaded 9405 reports from the Connecticut Children's Medical Center reporting system. We generated two samples of labeled and unlabeled reports containing 3155 and 255 for the patient mismatch and the weight error use cases respectively. We developed feature based Laplacian Support Vector machine (FS-LapSVM), a hybrid framework that combines feature selection with Laplacian Support Vector machine classifier (LapSVM). Superior performance of FS-LapSVM in finding patient weight error reports compared to LapSVM. Also, FS-LapSVM classifier outperformed standard LapSVM in classifying patient mismatch reports across all metrics.
{"title":"Laplacian SVM Based Feature Selection Improves Medical Event Reports Classification","authors":"S. Fodeh, A. Benin, P. Miller, Kyle Lee, Michele Koss, C. Brandt","doi":"10.1109/ICDMW.2015.141","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.141","url":null,"abstract":"Timely reporting and analysis of adverse events and medical errors is critical to driving forward programs in patient-safety, however, due to the large numbers of event reports accumulating daily in health institutions, manually finding and labeling certain types of errors or events is becoming increasingly challenging. We propose to automatically classify/label event reports via semi-supervised learning which utilizes labeled as well as unlabeled event reports to complete the classification task. We focused on classifying two types of event reports: patient mismatches and weight errors. We downloaded 9405 reports from the Connecticut Children's Medical Center reporting system. We generated two samples of labeled and unlabeled reports containing 3155 and 255 for the patient mismatch and the weight error use cases respectively. We developed feature based Laplacian Support Vector machine (FS-LapSVM), a hybrid framework that combines feature selection with Laplacian Support Vector machine classifier (LapSVM). Superior performance of FS-LapSVM in finding patient weight error reports compared to LapSVM. Also, FS-LapSVM classifier outperformed standard LapSVM in classifying patient mismatch reports across all metrics.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The expectation on the creation of new business based on the combination of data from different domains, organizations, and sections, has been increased. It is important to consider which stakeholders are involved and how they are involved in the new businesses. However, the combination of stakeholders and their relationships in the scenarios depends on the context and has various patterns, making it difficult to create a reliable business scenario taking account of all the stakeholders in various domains. In this paper, we propose a recommender system of stakeholder to support the generation of scenarios for data utilization. We implemented a system to externalize relevant stakeholders and estimate stakeholders' relationships in the scenarios considering a given context, using DBpedia and scenarios generated in Action Planning as knowledge bases.
{"title":"Estimating Contextual Relationships of Stakeholders in Scenarios Using DBpedia","authors":"Teruaki Hayashi","doi":"10.1109/ICDMW.2015.16","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.16","url":null,"abstract":"The expectation on the creation of new business based on the combination of data from different domains, organizations, and sections, has been increased. It is important to consider which stakeholders are involved and how they are involved in the new businesses. However, the combination of stakeholders and their relationships in the scenarios depends on the context and has various patterns, making it difficult to create a reliable business scenario taking account of all the stakeholders in various domains. In this paper, we propose a recommender system of stakeholder to support the generation of scenarios for data utilization. We implemented a system to externalize relevant stakeholders and estimate stakeholders' relationships in the scenarios considering a given context, using DBpedia and scenarios generated in Action Planning as knowledge bases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123805939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}