Yang Yang, N. Chawla, P. Basu, Bhaskar Prabhala, T. L. Porta
The understanding of how humans move is a long-standing challenge in the natural science. An important question is, to what degree is human behavior predictable? The ability to foresee the mobility of humans is crucial from predicting the spread of human to urban planning. Previous research has focused on predicting individual mobility behavior, such as the next location prediction problem. In this paper we study the human mobility behaviors from the perspective of network science. In the human mobility network, there will be a link between two humans if they are physically proximal to each other. We perform both microscopic and macroscopic explorations on the human mobility patterns. From the microscopic perspective, our objective is to answer whether two humans will be in proximity of each other or not. While from the macroscopic perspective, we are interested in whether we can infer the future topology of the human mobility network. In this paper we explore both problems by using link prediction technology, our methodology is demonstrated to have a greater degree of precision in predicting future mobility topology.
{"title":"Link prediction in human mobility networks","authors":"Yang Yang, N. Chawla, P. Basu, Bhaskar Prabhala, T. L. Porta","doi":"10.1145/2492517.2492656","DOIUrl":"https://doi.org/10.1145/2492517.2492656","url":null,"abstract":"The understanding of how humans move is a long-standing challenge in the natural science. An important question is, to what degree is human behavior predictable? The ability to foresee the mobility of humans is crucial from predicting the spread of human to urban planning. Previous research has focused on predicting individual mobility behavior, such as the next location prediction problem. In this paper we study the human mobility behaviors from the perspective of network science. In the human mobility network, there will be a link between two humans if they are physically proximal to each other. We perform both microscopic and macroscopic explorations on the human mobility patterns. From the microscopic perspective, our objective is to answer whether two humans will be in proximity of each other or not. While from the macroscopic perspective, we are interested in whether we can infer the future topology of the human mobility network. In this paper we explore both problems by using link prediction technology, our methodology is demonstrated to have a greater degree of precision in predicting future mobility topology.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contact diaries are interpersonal communication logs which are obtained in sociological and epidemiological studies. These logs can be used to study the social patterns of communities over a period of time. A dataset composed of diaries maps well to a set of one-tiered, categorical, independent and egocentric networks. This paper presents an interface for visualization and analysis of contact diaries datasets using an interactive radial mapping scheme, with case studies illustrating a standard workflow using the application. We facilitate individual diary analysis, multi-dataset comparison, and an overlay interface for investigating a set of many diaries in a singular space. With this interface, network researchers can utilize visualization to enhance their analysis of contact diaries.
{"title":"An interactive visualization interface for studying egocentric, categorical, contact diary datasets","authors":"Chris Bryan, K. Ma, Yang-chih Fu","doi":"10.1145/2492517.2492636","DOIUrl":"https://doi.org/10.1145/2492517.2492636","url":null,"abstract":"Contact diaries are interpersonal communication logs which are obtained in sociological and epidemiological studies. These logs can be used to study the social patterns of communities over a period of time. A dataset composed of diaries maps well to a set of one-tiered, categorical, independent and egocentric networks. This paper presents an interface for visualization and analysis of contact diaries datasets using an interactive radial mapping scheme, with case studies illustrating a standard workflow using the application. We facilitate individual diary analysis, multi-dataset comparison, and an overlay interface for investigating a set of many diaries in a singular space. With this interface, network researchers can utilize visualization to enhance their analysis of contact diaries.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122477186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile devices are becoming the integral access point of accessing the Electronic Health Records (EHR). This creates the need to enforce some level of reliability in terms of services accessibility time. However, supporting real-time access and services synchronization in highly distributed mobile environments can be challenging due to the fact that mobile devices rely on wireless communication mediums which can be unstable due to the mobility of the healthcare professionals. As an ongoing joint research with the City Hospital in Saskatoon, Canada, we focus on providing real-time accessibility of the medical record in the mobile environment. We propose a cloud-hosted middleware which performs macro activities such as medical services composition, data hoarding, and medical data events management. The evaluation of the framework, called Med App, shows that medical data dissemination can be achieved in a low-latency fashion.
{"title":"Efficient mobile services consumption in mHealth","authors":"Richard K. Lomotey, R. Deters","doi":"10.1145/2492517.2500279","DOIUrl":"https://doi.org/10.1145/2492517.2500279","url":null,"abstract":"Mobile devices are becoming the integral access point of accessing the Electronic Health Records (EHR). This creates the need to enforce some level of reliability in terms of services accessibility time. However, supporting real-time access and services synchronization in highly distributed mobile environments can be challenging due to the fact that mobile devices rely on wireless communication mediums which can be unstable due to the mobility of the healthcare professionals. As an ongoing joint research with the City Hospital in Saskatoon, Canada, we focus on providing real-time accessibility of the medical record in the mobile environment. We propose a cloud-hosted middleware which performs macro activities such as medical services composition, data hoarding, and medical data events management. The evaluation of the framework, called Med App, shows that medical data dissemination can be achieved in a low-latency fashion.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Communities are vehicles for efficiently disseminating news, rumors, and opinions in human social networks. Modeling information diffusion through a network can enable us to reach a superior functional understanding of the effect of network structures such as communities on information propagation. The intrinsic assumption is that form follows function-rational actors exercise social choice mechanisms to join communities that best serve their information needs. Particle Swarm Optimization (PSO) was originally designed to simulate aggregate social behavior; our proposed diffusion model, PSODM (Particle Swarm Optimization Diffusion Model) models information flow in a network by creating particle swarms for local network neighborhoods that optimize a continuous version of Holland's hyperplane-defined objective functions. In this paper, we show how our approach differs from prior modeling work in the area and demonstrate that it outperforms existing model-based community detection methods on several social network datasets.
{"title":"Modeling information diffusion and community membership using stochastic optimization","authors":"Alireza Hajibagheri, A. Hamzeh, G. Sukthankar","doi":"10.1145/2492517.2492545","DOIUrl":"https://doi.org/10.1145/2492517.2492545","url":null,"abstract":"Communities are vehicles for efficiently disseminating news, rumors, and opinions in human social networks. Modeling information diffusion through a network can enable us to reach a superior functional understanding of the effect of network structures such as communities on information propagation. The intrinsic assumption is that form follows function-rational actors exercise social choice mechanisms to join communities that best serve their information needs. Particle Swarm Optimization (PSO) was originally designed to simulate aggregate social behavior; our proposed diffusion model, PSODM (Particle Swarm Optimization Diffusion Model) models information flow in a network by creating particle swarms for local network neighborhoods that optimize a continuous version of Holland's hyperplane-defined objective functions. In this paper, we show how our approach differs from prior modeling work in the area and demonstrate that it outperforms existing model-based community detection methods on several social network datasets.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116496877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Access to massive real-time user generated personal information from micro blogging services, such as Twitter and Facebook, has the potential to enable new location-based recommendation and advertising services. However, sparse user profile information and low adoption of per-message geo-coordinate information necessitates development of location detection techniques that exposes a user's location from message content. We propose and evaluate content-based machine learning techniques to a) identify tweets containing a user's location, and, b) categorize a user location into the author's present or future location. Such an approach is advantageous because it a) relies purely on message content, b) can be used to predict a user's future presence at a location, c) relates user locations to some context (activities, trip plans, etc.), and, d) can be used to profile users constantly evolving location. Our experimental evaluation shows that the proposed techniques can identify and categorize user locations from message content with high accuracy. We also extract the time entities associated with a user's future location to show when the user would be at that location. Finally we illustrate the location-based data analytics potential of these techniques on two real-world datasets.
{"title":"Predicting time-sensitive user locations from social media","authors":"A. Jaiswal, Wei Peng, Tong Sun","doi":"10.1145/2492517.2500229","DOIUrl":"https://doi.org/10.1145/2492517.2500229","url":null,"abstract":"Access to massive real-time user generated personal information from micro blogging services, such as Twitter and Facebook, has the potential to enable new location-based recommendation and advertising services. However, sparse user profile information and low adoption of per-message geo-coordinate information necessitates development of location detection techniques that exposes a user's location from message content. We propose and evaluate content-based machine learning techniques to a) identify tweets containing a user's location, and, b) categorize a user location into the author's present or future location. Such an approach is advantageous because it a) relies purely on message content, b) can be used to predict a user's future presence at a location, c) relates user locations to some context (activities, trip plans, etc.), and, d) can be used to profile users constantly evolving location. Our experimental evaluation shows that the proposed techniques can identify and categorize user locations from message content with high accuracy. We also extract the time entities associated with a user's future location to show when the user would be at that location. Finally we illustrate the location-based data analytics potential of these techniques on two real-world datasets.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127041489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Community detection has emerged as an attractive topic due to the increasing need to understand and manage the networked data of tremendous magnitude. Networked data usually consists of links between the entities and the attributes for describing the entities. Various approaches have been proposed for detecting communities by utilizing the link information and/or attribute information. In this work, we study the problem of community detection for networked data with additional authorship information. By authorship, each entity in the network is authored by another type of entities (e.g., wiki pages are edited by users, products are purchased by customers), to which we refer as authors. Communities of entities are affected by their authors, e.g., two entities that are associated with the same author tend to belong to the same community. Therefore leveraging the authorship information would help us better detect the communities in the networked data. However, it also brings new challenges to community detection. The foremost question is how to model the correlation between communities and authorships. In this work, we address this question by proposing probabilistic models based on the popularity link model [1], which is demonstrated to yield encouraging results for community detection. We employ two methods for modeling the authorships: (i) the first one generates the authorships independently from links by community memberships and popularities of authors by analogy of the popularity link model; (ii) the second one models the links between entities based on authorships together with community memberships and popularities of nodes, which is an analog of previous author-topic model. Upon the basic models, we explore several extensions including (i) we model the community memberships of authors by that of their authored entities to reduce the number of redundant parameters; and (ii) we model the communities memberships of entities and/or authors by their attributes using a discriminative approach. We demonstrate the effectiveness of the proposed models by empirical studies.
{"title":"Community detection by popularity based models for authored networked data","authors":"Tianbao Yang, Prakash Mandayam Comar, Linli Xu","doi":"10.1145/2492517.2492520","DOIUrl":"https://doi.org/10.1145/2492517.2492520","url":null,"abstract":"Community detection has emerged as an attractive topic due to the increasing need to understand and manage the networked data of tremendous magnitude. Networked data usually consists of links between the entities and the attributes for describing the entities. Various approaches have been proposed for detecting communities by utilizing the link information and/or attribute information. In this work, we study the problem of community detection for networked data with additional authorship information. By authorship, each entity in the network is authored by another type of entities (e.g., wiki pages are edited by users, products are purchased by customers), to which we refer as authors. Communities of entities are affected by their authors, e.g., two entities that are associated with the same author tend to belong to the same community. Therefore leveraging the authorship information would help us better detect the communities in the networked data. However, it also brings new challenges to community detection. The foremost question is how to model the correlation between communities and authorships. In this work, we address this question by proposing probabilistic models based on the popularity link model [1], which is demonstrated to yield encouraging results for community detection. We employ two methods for modeling the authorships: (i) the first one generates the authorships independently from links by community memberships and popularities of authors by analogy of the popularity link model; (ii) the second one models the links between entities based on authorships together with community memberships and popularities of nodes, which is an analog of previous author-topic model. Upon the basic models, we explore several extensions including (i) we model the community memberships of authors by that of their authored entities to reduce the number of redundant parameters; and (ii) we model the communities memberships of entities and/or authors by their attributes using a discriminative approach. We demonstrate the effectiveness of the proposed models by empirical studies.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127128112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Community Question Answering (CQA) service enables its users to exchange knowledge in the form of questions and answers. By allowing the users to contribute knowledge, CQA not only satisfies the question askers but also provides valuable references to other users with similar queries. Due to a large volume of questions, not all questions get fully answered. As a result, it can be useful to route a question to a potential answerer. In this paper, we present a question routing scheme which takes into account the answering, commenting and voting propensities of the users. Unlike prior work which focuses on routing a question to the most desirable expert, we focus on routing it to a group of users - who would be willing to collaborate and provide useful answers to that question. Through empirical evidence, we show that more answers and comments are desirable for improving the lasting value of a question-answer thread. As a result, our focus is on routing a question to a team of compatible users.We propose a recommendation model that takes into account the compatibility, topical expertise and availability of the users. Our experiments over a large real-world dataset shows the effectiveness of our approach over several baseline models.
{"title":"Routing questions for collaborative answering in Community Question Answering","authors":"Shuo Chang, Aditya Pal","doi":"10.1145/2492517.2492559","DOIUrl":"https://doi.org/10.1145/2492517.2492559","url":null,"abstract":"Community Question Answering (CQA) service enables its users to exchange knowledge in the form of questions and answers. By allowing the users to contribute knowledge, CQA not only satisfies the question askers but also provides valuable references to other users with similar queries. Due to a large volume of questions, not all questions get fully answered. As a result, it can be useful to route a question to a potential answerer. In this paper, we present a question routing scheme which takes into account the answering, commenting and voting propensities of the users. Unlike prior work which focuses on routing a question to the most desirable expert, we focus on routing it to a group of users - who would be willing to collaborate and provide useful answers to that question. Through empirical evidence, we show that more answers and comments are desirable for improving the lasting value of a question-answer thread. As a result, our focus is on routing a question to a team of compatible users.We propose a recommendation model that takes into account the compatibility, topical expertise and availability of the users. Our experiments over a large real-world dataset shows the effectiveness of our approach over several baseline models.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126079283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social interactions preceding and succeeding trust formation can be significant indicators of formation of trust in online social networks. In this research we analyze the social interaction trends that lead and follow formation of trust in these networks. This enables us to hypothesize novel theories responsible for explaining formation of trust in online social settings and provide key insights. We find that a certain level of socialization threshold needs to be met in order for trust to develop between two individuals. This threshold differs across persons and across networks. Once the trust relation has developed between a pair of characters connected by some social relation (also referred to as a character dyad), trust can be maintained with a lower rate of socialization. Our first set of experiments is the relationship prediction problem. We predict the emergence of a social relationship like grouping, mentoring and trading between two individuals over a period of time by looking at the past characteristics of the network. We find that features related to trust have very little impact on this prediction. In the final set of experiments, we predict the formation of trust between individuals by looking at the topographical and semantic social interaction features between them. We generate three semantic dimensions for this task which can be recomputed with an observed social variable (say grouping) to create a new semantic social variable. In this endeavor, we successfully show that, including features related to socialization, gives us an approximate increase of 4-9% accuracy for trust relationship predictions.
{"title":"Socialization and trust formation: A mutual reinforcement? An exploratory analysis in an online virtual setting","authors":"Atanu Roy, Z. Borbora, J. Srivastava","doi":"10.1145/2492517.2492550","DOIUrl":"https://doi.org/10.1145/2492517.2492550","url":null,"abstract":"Social interactions preceding and succeeding trust formation can be significant indicators of formation of trust in online social networks. In this research we analyze the social interaction trends that lead and follow formation of trust in these networks. This enables us to hypothesize novel theories responsible for explaining formation of trust in online social settings and provide key insights. We find that a certain level of socialization threshold needs to be met in order for trust to develop between two individuals. This threshold differs across persons and across networks. Once the trust relation has developed between a pair of characters connected by some social relation (also referred to as a character dyad), trust can be maintained with a lower rate of socialization. Our first set of experiments is the relationship prediction problem. We predict the emergence of a social relationship like grouping, mentoring and trading between two individuals over a period of time by looking at the past characteristics of the network. We find that features related to trust have very little impact on this prediction. In the final set of experiments, we predict the formation of trust between individuals by looking at the topographical and semantic social interaction features between them. We generate three semantic dimensions for this task which can be recomputed with an observed social variable (say grouping) to create a new semantic social variable. In this endeavor, we successfully show that, including features related to socialization, gives us an approximate increase of 4-9% accuracy for trust relationship predictions.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125295062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing approaches to predicting tie strength between users involve either online social networks or location-based social networks. To date, few studies combined these networks to investigate the intensity of social relations between users. In this paper we analyzed tie strength defined as partners and acquaintances in two domains: a location-based social network and an online social network (Second Life). We compared user pairs in terms of their partnership and found significant differences between partners and acquaintances. Following these observations, we evaluated the social proximity of users via supervised and unsupervised learning algorithms and established that homophilic features were most valuable for the prediction of partnership.
{"title":"Acquaintance or partner? Predicting partnership in online and location-based social networks","authors":"Michael Steurer, C. Trattner","doi":"10.1145/2492517.2492562","DOIUrl":"https://doi.org/10.1145/2492517.2492562","url":null,"abstract":"Existing approaches to predicting tie strength between users involve either online social networks or location-based social networks. To date, few studies combined these networks to investigate the intensity of social relations between users. In this paper we analyzed tie strength defined as partners and acquaintances in two domains: a location-based social network and an online social network (Second Life). We compared user pairs in terms of their partnership and found significant differences between partners and acquaintances. Following these observations, we evaluated the social proximity of users via supervised and unsupervised learning algorithms and established that homophilic features were most valuable for the prediction of partnership.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khaled Dawoud, Shang Gao, Ala Qabaja, P. Karampelas, R. Alhajj
Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.
{"title":"Combining information extraction and text mining for cancer biomarker detection","authors":"Khaled Dawoud, Shang Gao, Ala Qabaja, P. Karampelas, R. Alhajj","doi":"10.1145/2492517.2500281","DOIUrl":"https://doi.org/10.1145/2492517.2500281","url":null,"abstract":"Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}