Twitter is one of the most popular social media platforms used by millions of users daily to post their opinions and emotions. Consequently, Twitter tweets have become a valuable knowledge source for emotion analysis. In this paper, we present a new framework, K2, for tweet emotion mapping and emotion change analysis. It introduces a novel, generic spatio-temporal data analysis and storytelling framework that can be used to understand the emotional evolution of a specific section of population. The input for our framework is the location and time of where and when the tweets were posted and an emotion assessment score in the range [Formula: see text], with [Formula: see text] representing a very high positive emotion and [Formula: see text] representing a very high negative emotion. Our framework first segments the input dataset into a number of batches with each batch representing a specific time interval. This time interval can be a week, a month or a day. By generalizing existing kernel density estimation techniques in the next step, we transform each batch into a continuous function that takes positive and negative values. We have used contouring algorithms to find the contiguous regions with highly positive and highly negative emotions belonging to each member of the batch. Finally, we apply a generic, change analysis framework that monitors how positive and negative emotion regions evolve over time. In particular, using this framework, unary and binary change predicate are defined and matched against the identified spatial clusters, and change relationships will then be recorded, for those spatial clusters for which a match occurs. We also propose animation techniques to facilitate spatio-temporal data storytelling based on the obtained spatio-temporal data analysis results. We demo our approach using tweets collected in the state of New York in the month of June 2014.
{"title":"K2: A Novel Data Analysis Framework to Understand US Emotions in Space and Time","authors":"Romita Banerjee, Karima Elgarroussi, Sujing Wang, Akhil Talari, Yongli Zhang, C. Eick","doi":"10.1142/S1793351X19400063","DOIUrl":"https://doi.org/10.1142/S1793351X19400063","url":null,"abstract":"Twitter is one of the most popular social media platforms used by millions of users daily to post their opinions and emotions. Consequently, Twitter tweets have become a valuable knowledge source for emotion analysis. In this paper, we present a new framework, K2, for tweet emotion mapping and emotion change analysis. It introduces a novel, generic spatio-temporal data analysis and storytelling framework that can be used to understand the emotional evolution of a specific section of population. The input for our framework is the location and time of where and when the tweets were posted and an emotion assessment score in the range [Formula: see text], with [Formula: see text] representing a very high positive emotion and [Formula: see text] representing a very high negative emotion. Our framework first segments the input dataset into a number of batches with each batch representing a specific time interval. This time interval can be a week, a month or a day. By generalizing existing kernel density estimation techniques in the next step, we transform each batch into a continuous function that takes positive and negative values. We have used contouring algorithms to find the contiguous regions with highly positive and highly negative emotions belonging to each member of the batch. Finally, we apply a generic, change analysis framework that monitors how positive and negative emotion regions evolve over time. In particular, using this framework, unary and binary change predicate are defined and matched against the identified spatial clusters, and change relationships will then be recorded, for those spatial clusters for which a match occurs. We also propose animation techniques to facilitate spatio-temporal data storytelling based on the obtained spatio-temporal data analysis results. We demo our approach using tweets collected in the state of New York in the month of June 2014.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126416054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1142/S1793351X19400051
Yongli Zhang, C. Eick
Nowadays, Twitter has become one of the fastest-growing microblogging services; consequently, analyzing this rich and continuously user-generated content can reveal unprecedentedly valuable knowledge. In this paper, we propose a novel two-stage system to detect and track events from tweets by integrating a Latent Dirichlet Allocation (LDA)-based approach and an efficient density–contour-based spatio-temporal clustering approach. In the proposed system, we first divide the geotagged tweet stream into temporal time windows; next, events are identified as topics in tweets using an LDA-based topic discovery step; then, each tweet is assigned an event label; next, a density–contour-based spatio-temporal clustering approach is employed to identify spatio-temporal event clusters. In our approach, topic continuity is established by calculating KL-divergences between topics and spatio-temporal continuity is established by a family of newly formulated spatial cluster distance functions. Moreover, the proposed density–contour clustering approach considers two types of densities: “absolute” density and “relative” density to identify event clusters where either there is a high density of event tweets or there is a high percentage of event tweets. We evaluate our approach using real-world data collected from Twitter, and the experimental results show that the proposed system can not only detect and track events effectively but also discover interesting patterns from geotagged tweets.
{"title":"Tracking Events in Twitter by Combining an LDA-Based Approach and a Density-Contour Clustering Approach","authors":"Yongli Zhang, C. Eick","doi":"10.1142/S1793351X19400051","DOIUrl":"https://doi.org/10.1142/S1793351X19400051","url":null,"abstract":"Nowadays, Twitter has become one of the fastest-growing microblogging services; consequently, analyzing this rich and continuously user-generated content can reveal unprecedentedly valuable knowledge. In this paper, we propose a novel two-stage system to detect and track events from tweets by integrating a Latent Dirichlet Allocation (LDA)-based approach and an efficient density–contour-based spatio-temporal clustering approach. In the proposed system, we first divide the geotagged tweet stream into temporal time windows; next, events are identified as topics in tweets using an LDA-based topic discovery step; then, each tweet is assigned an event label; next, a density–contour-based spatio-temporal clustering approach is employed to identify spatio-temporal event clusters. In our approach, topic continuity is established by calculating KL-divergences between topics and spatio-temporal continuity is established by a family of newly formulated spatial cluster distance functions. Moreover, the proposed density–contour clustering approach considers two types of densities: “absolute” density and “relative” density to identify event clusters where either there is a high density of event tweets or there is a high percentage of event tweets. We evaluate our approach using real-world data collected from Twitter, and the experimental results show that the proposed system can not only detect and track events effectively but also discover interesting patterns from geotagged tweets.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126145818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1142/S1793351X19500016
Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase
News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.
{"title":"Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents","authors":"Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase","doi":"10.1142/S1793351X19500016","DOIUrl":"https://doi.org/10.1142/S1793351X19500016","url":null,"abstract":"News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126298741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-03DOI: 10.1142/S1793351X19400026
S. M. Hamdi, Yubao Wu, R. Angryk, L. Krishnamurthy, R. Morris
The comprehensive set of neuronal connections of the human brain, which is known as the human connectomes, has provided valuable insight into neurological and neurodevelopmental disorders. Functional Magnetic Resonance Imaging (fMRI) has facilitated this research by capturing regionally specific brain activity. Resting state fMRI is used to extract the functional connectivity networks, which are edge-weighted complete graphs. In these complete functional connectivity networks, each node represents one brain region or Region of Interest (ROI), and each edge weight represents the strength of functional connectivity of the adjacent ROIs. In order to leverage existing graph mining methodologies, these complete graphs are often made sparse by applying thresholds on weights. This approach can result in loss of discriminative information while addressing the issue of biomarkers detection, i.e. finding discriminative ROIs and connections, given the data of healthy and disabled population. In this work, we demonstrate a novel framework for representing the complete functional connectivity networks in a threshold-free manner and identifying biomarkers by using feature selection algorithms. Additionally, to compute meaningful representations of the discriminative ROIs and connections, we apply tensor decomposition techniques. Experiments on a fMRI dataset of neurodevelopmental reading disabilities show the highly interpretable nature of our approach in finding the biomarkers of the diseases.
{"title":"Identification of Discriminative Subnetwork from fMRI-Based Complete Functional Connectivity Networks","authors":"S. M. Hamdi, Yubao Wu, R. Angryk, L. Krishnamurthy, R. Morris","doi":"10.1142/S1793351X19400026","DOIUrl":"https://doi.org/10.1142/S1793351X19400026","url":null,"abstract":"The comprehensive set of neuronal connections of the human brain, which is known as the human connectomes, has provided valuable insight into neurological and neurodevelopmental disorders. Functional Magnetic Resonance Imaging (fMRI) has facilitated this research by capturing regionally specific brain activity. Resting state fMRI is used to extract the functional connectivity networks, which are edge-weighted complete graphs. In these complete functional connectivity networks, each node represents one brain region or Region of Interest (ROI), and each edge weight represents the strength of functional connectivity of the adjacent ROIs. In order to leverage existing graph mining methodologies, these complete graphs are often made sparse by applying thresholds on weights. This approach can result in loss of discriminative information while addressing the issue of biomarkers detection, i.e. finding discriminative ROIs and connections, given the data of healthy and disabled population. In this work, we demonstrate a novel framework for representing the complete functional connectivity networks in a threshold-free manner and identifying biomarkers by using feature selection algorithms. Additionally, to compute meaningful representations of the discriminative ROIs and connections, we apply tensor decomposition techniques. Experiments on a fMRI dataset of neurodevelopmental reading disabilities show the highly interpretable nature of our approach in finding the biomarkers of the diseases.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125374972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-01DOI: 10.1142/S1793351X19400014
Ying Fung Yiu, E. Du, R. Mahapatra
The performance and efficiency of A* search algorithm heavily depends on the quality of the heuristic function. Therefore, designing an optimal heuristic function becomes the primary goal of developing a search algorithm for specific domains in artificial intelligence. However, it is difficult to design a well-constructed heuristic function without careful consideration and trial-and-error, especially for complex pathfinding problems. The complexity of a heuristic function increases and becomes unmanageable to design when an increasing number of parameters are involved. Existing approaches often avoid complex heuristic function design: they either trade-off the accuracy for faster computation or taking advantage of the parallelism for better performance. The objective of this paper is to reduce the difficulty of complex heuristic function design for A* search algorithm. We aim to design an algorithm that can be automatically optimized to achieve rapid search with high accuracy and low computational cost. In this paper, we present a novel design and optimization method for a Multi-Weighted-Heuristics function (MWH) named Evolutionary Heuristic A* search (EHA*) to: (1) minimize the effort on heuristic function design via Genetic Algorithm (GA), (2) optimize the performance of A* search and its variants including but not limited to WA* and MHA*, and (3) guarantee the completeness and optimality. EHA* algorithm enables high performance searches and significantly simplifies the processing of heuristic design. We apply EHA* to multiple grid-based pathfinding benchmarks to evaluate the performance. Our experiment result shows that EHA* (1) is capable of choosing an accurate heuristic function that provides an optimal solution, (2) can identify and eliminate inefficient heuristics, (3) is able to automatically design multi-heuristics function, and (4) minimizes both the time and space complexity.
A*搜索算法的性能和效率在很大程度上取决于启发式函数的质量。因此,设计最优启发式函数成为开发特定领域人工智能搜索算法的首要目标。然而,如果没有仔细的考虑和试错,很难设计一个构造良好的启发式函数,特别是对于复杂的寻径问题。当涉及到越来越多的参数时,启发式函数的复杂性会增加,并且变得难以设计。现有的方法通常避免复杂的启发式函数设计:它们要么为了更快的计算而牺牲准确性,要么为了更好的性能而利用并行性。本文的目的是为了降低A*搜索算法中复杂启发式函数设计的难度。我们的目标是设计一种可以自动优化的算法,以实现快速、高精度和低计算成本的搜索。本文提出了一种新的多加权启发式函数(MWH)的设计和优化方法——进化启发式a *搜索(Evolutionary Heuristic a * search, EHA*),以:(1)通过遗传算法(GA)最小化启发式函数设计的工作量;(2)优化a *搜索及其变体(包括但不限于WA*和MHA*)的性能;(3)保证其完备性和最优性。EHA*算法实现了高性能搜索,显著简化了启发式设计的处理。我们将EHA*应用于多个基于网格的寻路基准来评估性能。实验结果表明,EHA*(1)能够准确选择提供最优解的启发式函数,(2)能够识别和消除低效的启发式函数,(3)能够自动设计多启发式函数,(4)最小化时间和空间复杂度。
{"title":"Evolutionary Heuristic A* Search: Pathfinding Algorithm with Self-Designed and Optimized Heuristic Function","authors":"Ying Fung Yiu, E. Du, R. Mahapatra","doi":"10.1142/S1793351X19400014","DOIUrl":"https://doi.org/10.1142/S1793351X19400014","url":null,"abstract":"The performance and efficiency of A* search algorithm heavily depends on the quality of the heuristic function. Therefore, designing an optimal heuristic function becomes the primary goal of developing a search algorithm for specific domains in artificial intelligence. However, it is difficult to design a well-constructed heuristic function without careful consideration and trial-and-error, especially for complex pathfinding problems. The complexity of a heuristic function increases and becomes unmanageable to design when an increasing number of parameters are involved. Existing approaches often avoid complex heuristic function design: they either trade-off the accuracy for faster computation or taking advantage of the parallelism for better performance. The objective of this paper is to reduce the difficulty of complex heuristic function design for A* search algorithm. We aim to design an algorithm that can be automatically optimized to achieve rapid search with high accuracy and low computational cost. In this paper, we present a novel design and optimization method for a Multi-Weighted-Heuristics function (MWH) named Evolutionary Heuristic A* search (EHA*) to: (1) minimize the effort on heuristic function design via Genetic Algorithm (GA), (2) optimize the performance of A* search and its variants including but not limited to WA* and MHA*, and (3) guarantee the completeness and optimality. EHA* algorithm enables high performance searches and significantly simplifies the processing of heuristic design. We apply EHA* to multiple grid-based pathfinding benchmarks to evaluate the performance. Our experiment result shows that EHA* (1) is capable of choosing an accurate heuristic function that provides an optimal solution, (2) can identify and eliminate inefficient heuristics, (3) is able to automatically design multi-heuristics function, and (4) minimizes both the time and space complexity.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"569 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1142/S1793351X18400196
Naifan Zhuang, T. Kieu, Jun Ye, K. Hua
With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.
{"title":"Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding","authors":"Naifan Zhuang, T. Kieu, Jun Ye, K. Hua","doi":"10.1142/S1793351X18400196","DOIUrl":"https://doi.org/10.1142/S1793351X18400196","url":null,"abstract":"With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128621769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1142/S1793351X18500058
Christian Eymüller, Constantin Wanninger, A. Hoffmann, W. Reif
This paper describes the use of semantically annotated data for the expression of sensors and actuators with their properties and capabilities. For this purpose, a plug and play mechanism is presented in order to exchange self-descriptions between several hardware devices and then use the established information for the execution of capabilities. For the combination of different capabilities distributed on multiple hardware devices an architecture is presented to link abstract capabilities. These abstract capabilities are initialized through concrete capabilities of the discovered hardware devices by the plug and play mechanism.
{"title":"Semantic Plug and Play - Self-Descriptive Modular Hardware for Robotic Applications","authors":"Christian Eymüller, Constantin Wanninger, A. Hoffmann, W. Reif","doi":"10.1142/S1793351X18500058","DOIUrl":"https://doi.org/10.1142/S1793351X18500058","url":null,"abstract":"This paper describes the use of semantically annotated data for the expression of sensors and actuators with their properties and capabilities. For this purpose, a plug and play mechanism is presented in order to exchange self-descriptions between several hardware devices and then use the established information for the execution of capabilities. For the combination of different capabilities distributed on multiple hardware devices an architecture is presented to link abstract capabilities. These abstract capabilities are initialized through concrete capabilities of the discovered hardware devices by the plug and play mechanism.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132635131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1142/S1793351X18400214
Kento Masui, A. Ochiai, Shintaro Yoshizawa, Hideki Nakayama
The task of visual relationship recognition (VRR) is to recognize multiple objects and their relationships in an image. A fundamental difficulty of this task is class–number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these challenges, this paper proposes a novel architecture with a recurrent neural network (RNN) and triplet unit (TU). The RNN allows our model to be optimized for outputting a sequence of relationships. By optimizing our model to a semantically diverse relationship sequence, we increase the variety in output relationships. At each step of the RNN, our TU enables the model to classify a relationship while achieving class–number scalability by decomposing a relationship into a subject–predicate–object (SPO) triplet. We evaluate our model on various datasets and compare the results to a baseline. These experimental results show our model’s superior recall and precision with fewer predictions compared to the baseline, even as it produces greater variety in relationships.
{"title":"Recurrent Visual Relationship Recognition with Triplet Unit for Diversity","authors":"Kento Masui, A. Ochiai, Shintaro Yoshizawa, Hideki Nakayama","doi":"10.1142/S1793351X18400214","DOIUrl":"https://doi.org/10.1142/S1793351X18400214","url":null,"abstract":"The task of visual relationship recognition (VRR) is to recognize multiple objects and their relationships in an image. A fundamental difficulty of this task is class–number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these challenges, this paper proposes a novel architecture with a recurrent neural network (RNN) and triplet unit (TU). The RNN allows our model to be optimized for outputting a sequence of relationships. By optimizing our model to a semantically diverse relationship sequence, we increase the variety in output relationships. At each step of the RNN, our TU enables the model to classify a relationship while achieving class–number scalability by decomposing a relationship into a subject–predicate–object (SPO) triplet. We evaluate our model on various datasets and compare the results to a baseline. These experimental results show our model’s superior recall and precision with fewer predictions compared to the baseline, even as it produces greater variety in relationships.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121390238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1142/S1793351X18400202
Ladislav Marsik, Petr Martisek, J. Pokorný, M. Rusek, K. Slaninová, J. Martinovič, Matthias Robine, P. Hanna, Yann Bayle
We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.
{"title":"KaraMIR: A Project for Cover Song Identification and Singing Voice Analysis Using a Karaoke Songs Dataset","authors":"Ladislav Marsik, Petr Martisek, J. Pokorný, M. Rusek, K. Slaninová, J. Martinovič, Matthias Robine, P. Hanna, Yann Bayle","doi":"10.1142/S1793351X18400202","DOIUrl":"https://doi.org/10.1142/S1793351X18400202","url":null,"abstract":"We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132828540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}