Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050852
F. Amato, R. Boselli, M. Cesarini, Fabio Mercorio, Mario Mezzanzanica, V. Moscato, Fabio Persia, A. Picariello
Today the Web represents a rich source of labour market data for both public and private operators, as a growing number of job offers are advertised through Web portals and services. In this paper we apply and compare several techniques, namely explicit-rules, machine learning, and LDA-based algorithms to classify a real dataset of Web job offers collected from 12 heterogeneous sources against a standard classification system of occupations.
{"title":"Challenge: Processing web texts for classifying job offers","authors":"F. Amato, R. Boselli, M. Cesarini, Fabio Mercorio, Mario Mezzanzanica, V. Moscato, Fabio Persia, A. Picariello","doi":"10.1109/ICOSC.2015.7050852","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050852","url":null,"abstract":"Today the Web represents a rich source of labour market data for both public and private operators, as a growing number of job offers are advertised through Web portals and services. In this paper we apply and compare several techniques, namely explicit-rules, machine learning, and LDA-based algorithms to classify a real dataset of Web job offers collected from 12 heterogeneous sources against a standard classification system of occupations.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125633256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050801
M. Kanakaraj, R. R. Guddeti
Mining opinions and analyzing sentiments from social network data help in various fields such as even prediction, analyzing overall mood of public on a particular social issue and so on. This paper involves analyzing the mood of the society on a particular news from Twitter posts. The key idea of the paper is to increase the accuracy of classification by including Natural Language Processing Techniques (NLP) especially semantics and Word Sense Disambiguation. The mined text information is subjected to Ensemble classification to analyze the sentiment. Ensemble classification involves combining the effect of various independent classifiers on a particular classification problem. Experiments conducted demonstrate that ensemble classifier outperforms traditional machine learning classifiers by 3-5%.
{"title":"Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques","authors":"M. Kanakaraj, R. R. Guddeti","doi":"10.1109/ICOSC.2015.7050801","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050801","url":null,"abstract":"Mining opinions and analyzing sentiments from social network data help in various fields such as even prediction, analyzing overall mood of public on a particular social issue and so on. This paper involves analyzing the mood of the society on a particular news from Twitter posts. The key idea of the paper is to increase the accuracy of classification by including Natural Language Processing Techniques (NLP) especially semantics and Word Sense Disambiguation. The mined text information is subjected to Ensemble classification to analyze the sentiment. Ensemble classification involves combining the effect of various independent classifiers on a particular classification problem. Experiments conducted demonstrate that ensemble classifier outperforms traditional machine learning classifiers by 3-5%.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115631453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050795
Qingliang Miao, Yao Meng, Bo Zhang
Enterprise knowledge graph is crucial for both enterprises and their management agencies. However, enterprise knowledge graph construction faces several challenges such as heterogeneous taxonomies, knowledge inconsistencies or conflicts and lack of semantic links. In this paper, we use Linked Data paradigm to construct enterprise knowledge graph by integrating heterogeneous enterprise data itself as well as link enterprise data with external data. Preliminary experiment on real world dataset shows the proposed approach is effective.
{"title":"Chinese enterprise knowledge graph construction based on Linked Data","authors":"Qingliang Miao, Yao Meng, Bo Zhang","doi":"10.1109/ICOSC.2015.7050795","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050795","url":null,"abstract":"Enterprise knowledge graph is crucial for both enterprises and their management agencies. However, enterprise knowledge graph construction faces several challenges such as heterogeneous taxonomies, knowledge inconsistencies or conflicts and lack of semantic links. In this paper, we use Linked Data paradigm to construct enterprise knowledge graph by integrating heterogeneous enterprise data itself as well as link enterprise data with external data. Preliminary experiment on real world dataset shows the proposed approach is effective.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117128015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050820
Hsin-Yu Ha, Shu‐Ching Chen, Min Chen
Feature selection is an actively researched topic in varies domains, mainly owing to its ability in greatly reducing feature space and associated computational time. Given the explosive growth of high-dimensional multimedia data, a well-designed feature selection method can be leveraged in classifying multimedia contents into high-level semantic concepts. In this paper we present a multi-phase feature selection method using maximum spanning tree built from feature correlation among multiple modalities (FC-MST). The method aims to first thoroughly explore not only the correlation between features within and across modalities, but also the association of features towards semantic concepts. Secondly, with the correlations, we identify important features and exclude redundant or irrelevant ones. The proposed method is tested on a well-known benchmark multimedia data set called NUS-WIDE and the experimental results show that it outperforms four well-known feature selection methods in all three important measurement metrics.
{"title":"FC-MST: Feature correlation maximum spanning tree for multimedia concept classification","authors":"Hsin-Yu Ha, Shu‐Ching Chen, Min Chen","doi":"10.1109/ICOSC.2015.7050820","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050820","url":null,"abstract":"Feature selection is an actively researched topic in varies domains, mainly owing to its ability in greatly reducing feature space and associated computational time. Given the explosive growth of high-dimensional multimedia data, a well-designed feature selection method can be leveraged in classifying multimedia contents into high-level semantic concepts. In this paper we present a multi-phase feature selection method using maximum spanning tree built from feature correlation among multiple modalities (FC-MST). The method aims to first thoroughly explore not only the correlation between features within and across modalities, but also the association of features towards semantic concepts. Secondly, with the correlations, we identify important features and exclude redundant or irrelevant ones. The proposed method is tested on a well-known benchmark multimedia data set called NUS-WIDE and the experimental results show that it outperforms four well-known feature selection methods in all three important measurement metrics.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117188327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050849
Zhengping Wu
In today's cloud service market, different providers have very different low-level mechanisms to accommodate various types of policies from their users. Enforcement of policies over multiple cloud provider domains is an intrinsically complex problem for both sides. In reality, cloud providers have to either manually update enforcement mechanisms or negotiate adjusted policies with their users for enforcement. To save these high-cost and error-prone manual updates or adjustments, an automatic and flexible solution is desired. This paper proposes a semantic modeling and mapping based approach to help enforce high-level user policies across cloud domain boundaries when applications or IT operations have to span over multiple cloud domains. This approach creates policy models and maps these models across cloud domain boundaries automatically or semi-automatically. Policy rules following these mappings can be tied to multiple enforcement mechanisms in different cloud domains. If a rule cannot be mapped, a manual adjustment solution will be suggested. A case study is also included to demonstrate the efficiency and accuracy of this approach.
{"title":"Multi-cloud policy enforcement through semantic modeling and mapping","authors":"Zhengping Wu","doi":"10.1109/ICOSC.2015.7050849","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050849","url":null,"abstract":"In today's cloud service market, different providers have very different low-level mechanisms to accommodate various types of policies from their users. Enforcement of policies over multiple cloud provider domains is an intrinsically complex problem for both sides. In reality, cloud providers have to either manually update enforcement mechanisms or negotiate adjusted policies with their users for enforcement. To save these high-cost and error-prone manual updates or adjustments, an automatic and flexible solution is desired. This paper proposes a semantic modeling and mapping based approach to help enforce high-level user policies across cloud domain boundaries when applications or IT operations have to span over multiple cloud domains. This approach creates policy models and maps these models across cloud domain boundaries automatically or semi-automatically. Policy rules following these mappings can be tied to multiple enforcement mechanisms in different cloud domains. If a rule cannot be mapped, a manual adjustment solution will be suggested. A case study is also included to demonstrate the efficiency and accuracy of this approach.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124707988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050774
T. C. Chinsha, Shibily Joseph
Opinion mining or sentiment analysis is the process of analysing the text about a topic written in a natural language and classify them as positive negative or neutral based on the humans sentiments, emotions, opinions expressed in it. Nowadays, the opinions expressed through reviews are increasing day by day on the web. It is practically impossible to analyse and extract opinions from such huge number of reviews manually. To solve this problem an automated opinion mining approach is needed. This task of automatic opinion mining can be done mainly at three different levels, which are document level, sentence level and aspect level. Most of the previous work is in the field of document or sentence level opinion mining. This paper focus on aspect level opinion mining and propose a new syntactic based approach for it, which uses syntactic dependency, aggregate score of opinion words, SentiWordNet and aspect table together for opinion mining. The experimental work was done on restaurant reviews. The dataset of restaurant reviews was collected from web and tagged manually. The proposed method achieved total accuracy of 78.04% on the annotated test set. The method was also compared with the method, which uses Part-Of-Speech tagger for feature extraction; the obtained results show that the proposed method gives 6% more accuracy than previous one on the annotated test set.
{"title":"A syntactic approach for aspect based opinion mining","authors":"T. C. Chinsha, Shibily Joseph","doi":"10.1109/ICOSC.2015.7050774","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050774","url":null,"abstract":"Opinion mining or sentiment analysis is the process of analysing the text about a topic written in a natural language and classify them as positive negative or neutral based on the humans sentiments, emotions, opinions expressed in it. Nowadays, the opinions expressed through reviews are increasing day by day on the web. It is practically impossible to analyse and extract opinions from such huge number of reviews manually. To solve this problem an automated opinion mining approach is needed. This task of automatic opinion mining can be done mainly at three different levels, which are document level, sentence level and aspect level. Most of the previous work is in the field of document or sentence level opinion mining. This paper focus on aspect level opinion mining and propose a new syntactic based approach for it, which uses syntactic dependency, aggregate score of opinion words, SentiWordNet and aspect table together for opinion mining. The experimental work was done on restaurant reviews. The dataset of restaurant reviews was collected from web and tagged manually. The proposed method achieved total accuracy of 78.04% on the annotated test set. The method was also compared with the method, which uses Part-Of-Speech tagger for feature extraction; the obtained results show that the proposed method gives 6% more accuracy than previous one on the annotated test set.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121648619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050816
S. Gaurav, Y. Jithendranath, Aruna Adil, Sudhakar Yadav, B. Kasturi
Today, search engines play a vital role in accessing the online content. However, the data in the webpages are not clearly perceived by search engines. As a result, it provides a lot of irrelevant data with little desired information. In addition, it takes a lot of time in searching the appropriate result. By studying the online educational needs of Indian school children, we aim to retrieve appropriate educational information in less time through effective search. Schema.org [5] is a collection of markups which helps webmasters to mark up the webpages for retrieval of relevant information. But, properties related to education are not covered completely. Learning Resource Metadata Initiative (LRMI) [9] has created few properties for education and added in schema.org. We map our study with LRMI's work, and propose some new properties as an extension to the schema, which can be useful for students and teachers.
{"title":"A study to assess and enhance educational specific search on web for school children","authors":"S. Gaurav, Y. Jithendranath, Aruna Adil, Sudhakar Yadav, B. Kasturi","doi":"10.1109/ICOSC.2015.7050816","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050816","url":null,"abstract":"Today, search engines play a vital role in accessing the online content. However, the data in the webpages are not clearly perceived by search engines. As a result, it provides a lot of irrelevant data with little desired information. In addition, it takes a lot of time in searching the appropriate result. By studying the online educational needs of Indian school children, we aim to retrieve appropriate educational information in less time through effective search. Schema.org [5] is a collection of markups which helps webmasters to mark up the webpages for retrieval of relevant information. But, properties related to education are not covered completely. Learning Resource Metadata Initiative (LRMI) [9] has created few properties for education and added in schema.org. We map our study with LRMI's work, and propose some new properties as an extension to the schema, which can be useful for students and teachers.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132196765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050824
G. Shao, Shunxiang Wu, Tiejun Li
Different clustering based strategies have been proposed to increase the performance of image segmentation. However, due to complexity of chip preparing process, the real microarray image will contain artifacts, noises, and spots with different shapes, which result in these segmentation algorithms can't meet the satisfactory results. To overcome those drawbacks, this paper proposed an improved k-means clustering based algorithm to improve the segmentation accuracy rate. Firstly, an automatic contrast enhancement method is introduced to improve the image quality. Secondly, the maximum between-class variance gridding is conducted to separate the spots into sole areas. Then, we combine the k-means clustering algorithm with the moving k-means clustering method to gain a higher segmentation precision. In addition, an adjustable circle means is used for missing spots segmentation. Finally, intensive experiments are conducted on GEO and SMD data set. The results shows that the method presented in this paper is more accurate and robustness.
{"title":"cDNA microarray image segmentation with an improved moving k-means clustering method","authors":"G. Shao, Shunxiang Wu, Tiejun Li","doi":"10.1109/ICOSC.2015.7050824","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050824","url":null,"abstract":"Different clustering based strategies have been proposed to increase the performance of image segmentation. However, due to complexity of chip preparing process, the real microarray image will contain artifacts, noises, and spots with different shapes, which result in these segmentation algorithms can't meet the satisfactory results. To overcome those drawbacks, this paper proposed an improved k-means clustering based algorithm to improve the segmentation accuracy rate. Firstly, an automatic contrast enhancement method is introduced to improve the image quality. Secondly, the maximum between-class variance gridding is conducted to separate the spots into sole areas. Then, we combine the k-means clustering algorithm with the moving k-means clustering method to gain a higher segmentation precision. In addition, an adjustable circle means is used for missing spots segmentation. Finally, intensive experiments are conducted on GEO and SMD data set. The results shows that the method presented in this paper is more accurate and robustness.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134112593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050825
Sunila Gollapudi
We are seeing a sea change down the pike in terms of financial information aggregation and consumption; this could potentially be a game changer in financial services space with focus on ability to commoditize data. Financial Services Industry deals with a tremendous amount of data that varies in its structure, volume and purpose. The data is generated in the ecosystem (its customers, its own accounts, partner trades, securities transactions etc.), is handled by many systems - each having its own perspective. Front-office systems handle transactional behavior of the data, middle office systems which typically work with a drop-copy of the data subject it to intense processing, business logic, computations (such as inventory positions, fee calculations, commissions) and the back office systems deal with reconciliation, cleansing, exception management etc. Then there are the analytic systems which are concerned with auditing, compliance reporting as well as business analytics. Data that flows through this ecosystem gets aggregated, transformed, and transported time and again. Traditional approaches to managing such data leverage Extract-Transform-Load (ETL) technologies to set up data marts where each data mart serves a specific purpose (such as reconciliation or analytics). The result is proliferation of transformations and marts in the Organization. The need is to have architectures and IT systems that can aggregate data from many such sources without making any assumptions on HOW, WHERE or WHEN this data will be used. The incoming data is semantically annotated and stored in the triple store within storage tier and offers the ability to store, query and draw inferences using the ontology. There is a probable need for a Big Data Solution here that helps ease data liberation and co-location. This paper is a summary of one such business case of the Financial Services Industry where traditional ETL silos was broken to support the structurally dynamic, ever expanding and changing data usage needs employing Ontology and Semantic techniques like RDF/RDFS, SPARQL, OWL and related stack.
{"title":"Aggregating financial services data without assumptions: A semantic data reference architecture","authors":"Sunila Gollapudi","doi":"10.1109/ICOSC.2015.7050825","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050825","url":null,"abstract":"We are seeing a sea change down the pike in terms of financial information aggregation and consumption; this could potentially be a game changer in financial services space with focus on ability to commoditize data. Financial Services Industry deals with a tremendous amount of data that varies in its structure, volume and purpose. The data is generated in the ecosystem (its customers, its own accounts, partner trades, securities transactions etc.), is handled by many systems - each having its own perspective. Front-office systems handle transactional behavior of the data, middle office systems which typically work with a drop-copy of the data subject it to intense processing, business logic, computations (such as inventory positions, fee calculations, commissions) and the back office systems deal with reconciliation, cleansing, exception management etc. Then there are the analytic systems which are concerned with auditing, compliance reporting as well as business analytics. Data that flows through this ecosystem gets aggregated, transformed, and transported time and again. Traditional approaches to managing such data leverage Extract-Transform-Load (ETL) technologies to set up data marts where each data mart serves a specific purpose (such as reconciliation or analytics). The result is proliferation of transformations and marts in the Organization. The need is to have architectures and IT systems that can aggregate data from many such sources without making any assumptions on HOW, WHERE or WHEN this data will be used. The incoming data is semantically annotated and stored in the triple store within storage tier and offers the ability to store, query and draw inferences using the ontology. There is a probable need for a Big Data Solution here that helps ease data liberation and co-location. This paper is a summary of one such business case of the Financial Services Industry where traditional ETL silos was broken to support the structurally dynamic, ever expanding and changing data usage needs employing Ontology and Semantic techniques like RDF/RDFS, SPARQL, OWL and related stack.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114791171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-02DOI: 10.1109/ICOSC.2015.7050806
Gregor Große-Bölting, Chifumi Nishioka, A. Scherp
We present the design and application of a generic approach for semantic extraction of professional interests from social media using a hierarchical knowledge-base and spreading activation theory. By this, we can assess to which extend a user's social media life reflects his or her professional life. Detecting named entities related to professional interests is conducted by a taxonomy of terms in a particular domain. It can be assumed that one can freely obtain such a taxonomy for many professional fields including computer science, social sciences, economics, agriculture, medicine, and so on. In our experiments, we consider the domain of computer science and extract professional interests from a user's Twitter stream. We compare different spreading activation functions and metrics to assess the performance of the obtained results against evaluation data obtained from the professional publications of the Twitter users. Besides selected existing activation functions from the literature, we also introduce a new spreading activation function that normalizes the activation w.r.t. to the outdegree of the concepts.
{"title":"Generic process for extracting user profiles from social media using hierarchical knowledge bases","authors":"Gregor Große-Bölting, Chifumi Nishioka, A. Scherp","doi":"10.1109/ICOSC.2015.7050806","DOIUrl":"https://doi.org/10.1109/ICOSC.2015.7050806","url":null,"abstract":"We present the design and application of a generic approach for semantic extraction of professional interests from social media using a hierarchical knowledge-base and spreading activation theory. By this, we can assess to which extend a user's social media life reflects his or her professional life. Detecting named entities related to professional interests is conducted by a taxonomy of terms in a particular domain. It can be assumed that one can freely obtain such a taxonomy for many professional fields including computer science, social sciences, economics, agriculture, medicine, and so on. In our experiments, we consider the domain of computer science and extract professional interests from a user's Twitter stream. We compare different spreading activation functions and metrics to assess the performance of the obtained results against evaluation data obtained from the professional publications of the Twitter users. Besides selected existing activation functions from the literature, we also introduce a new spreading activation function that normalizes the activation w.r.t. to the outdegree of the concepts.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130361855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}