Int. J. Semantic Comput.最新文献_第10页

K2: A Novel Data Analysis Framework to Understand US Emotions in Space and Time K2:一种新的数据分析框架来理解美国人在空间和时间上的情绪

Int. J. Semantic Comput.

Pub Date : 2019-04-03 DOI: 10.1142/S1793351X19400063

Romita Banerjee, Karima Elgarroussi, Sujing Wang, Akhil Talari, Yongli Zhang, C. Eick

Twitter is one of the most popular social media platforms used by millions of users daily to post their opinions and emotions. Consequently, Twitter tweets have become a valuable knowledge source for emotion analysis. In this paper, we present a new framework, K2, for tweet emotion mapping and emotion change analysis. It introduces a novel, generic spatio-temporal data analysis and storytelling framework that can be used to understand the emotional evolution of a specific section of population. The input for our framework is the location and time of where and when the tweets were posted and an emotion assessment score in the range [Formula: see text], with [Formula: see text] representing a very high positive emotion and [Formula: see text] representing a very high negative emotion. Our framework first segments the input dataset into a number of batches with each batch representing a specific time interval. This time interval can be a week, a month or a day. By generalizing existing kernel density estimation techniques in the next step, we transform each batch into a continuous function that takes positive and negative values. We have used contouring algorithms to find the contiguous regions with highly positive and highly negative emotions belonging to each member of the batch. Finally, we apply a generic, change analysis framework that monitors how positive and negative emotion regions evolve over time. In particular, using this framework, unary and binary change predicate are defined and matched against the identified spatial clusters, and change relationships will then be recorded, for those spatial clusters for which a match occurs. We also propose animation techniques to facilitate spatio-temporal data storytelling based on the obtained spatio-temporal data analysis results. We demo our approach using tweets collected in the state of New York in the month of June 2014.

推特是最受欢迎的社交媒体平台之一，每天有数百万用户使用它来发布他们的观点和情绪。因此，Twitter tweets已成为情感分析的宝贵知识来源。在本文中，我们提出了一个新的框架，K2，用于tweet情绪映射和情绪变化分析。它引入了一种新颖的、通用的时空数据分析和叙事框架，可用于理解特定人群的情感演变。我们框架的输入是tweet发布的地点和时间，以及在[公式:见文本]范围内的情绪评估得分，其中[公式:见文本]代表非常高的积极情绪，[公式:见文本]代表非常高的消极情绪。我们的框架首先将输入数据集分割成多个批次，每个批次代表一个特定的时间间隔。这个时间间隔可以是一周、一个月或一天。在下一步中，通过推广现有的核密度估计技术，我们将每个批转换为一个取正值和负值的连续函数。我们使用等高线算法来找到属于批处理每个成员的具有高度积极和高度消极情绪的连续区域。最后，我们应用了一个通用的变化分析框架来监测积极和消极情绪区域如何随着时间的推移而演变。特别是，使用该框架，定义一元和二元变化谓词，并针对已识别的空间集群进行匹配，然后记录发生匹配的空间集群的变化关系。我们还提出了基于获得的时空数据分析结果的动画技术，以促进时空数据的故事叙述。我们使用2014年6月在纽约州收集的tweet来演示我们的方法。

{"title":"K2: A Novel Data Analysis Framework to Understand US Emotions in Space and Time","authors":"Romita Banerjee, Karima Elgarroussi, Sujing Wang, Akhil Talari, Yongli Zhang, C. Eick","doi":"10.1142/S1793351X19400063","DOIUrl":"https://doi.org/10.1142/S1793351X19400063","url":null,"abstract":"Twitter is one of the most popular social media platforms used by millions of users daily to post their opinions and emotions. Consequently, Twitter tweets have become a valuable knowledge source for emotion analysis. In this paper, we present a new framework, K2, for tweet emotion mapping and emotion change analysis. It introduces a novel, generic spatio-temporal data analysis and storytelling framework that can be used to understand the emotional evolution of a specific section of population. The input for our framework is the location and time of where and when the tweets were posted and an emotion assessment score in the range [Formula: see text], with [Formula: see text] representing a very high positive emotion and [Formula: see text] representing a very high negative emotion. Our framework first segments the input dataset into a number of batches with each batch representing a specific time interval. This time interval can be a week, a month or a day. By generalizing existing kernel density estimation techniques in the next step, we transform each batch into a continuous function that takes positive and negative values. We have used contouring algorithms to find the contiguous regions with highly positive and highly negative emotions belonging to each member of the batch. Finally, we apply a generic, change analysis framework that monitors how positive and negative emotion regions evolve over time. In particular, using this framework, unary and binary change predicate are defined and matched against the identified spatial clusters, and change relationships will then be recorded, for those spatial clusters for which a match occurs. We also propose animation techniques to facilitate spatio-temporal data storytelling based on the obtained spatio-temporal data analysis results. We demo our approach using tweets collected in the state of New York in the month of June 2014.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126416054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tracking Events in Twitter by Combining an LDA-Based Approach and a Density-Contour Clustering Approach 结合基于lda和密度-轮廓聚类方法的Twitter事件跟踪

Int. J. Semantic Comput.

Pub Date : 2019-04-03 DOI: 10.1142/S1793351X19400051

Yongli Zhang, C. Eick

Nowadays, Twitter has become one of the fastest-growing microblogging services; consequently, analyzing this rich and continuously user-generated content can reveal unprecedentedly valuable knowledge. In this paper, we propose a novel two-stage system to detect and track events from tweets by integrating a Latent Dirichlet Allocation (LDA)-based approach and an efficient density–contour-based spatio-temporal clustering approach. In the proposed system, we first divide the geotagged tweet stream into temporal time windows; next, events are identified as topics in tweets using an LDA-based topic discovery step; then, each tweet is assigned an event label; next, a density–contour-based spatio-temporal clustering approach is employed to identify spatio-temporal event clusters. In our approach, topic continuity is established by calculating KL-divergences between topics and spatio-temporal continuity is established by a family of newly formulated spatial cluster distance functions. Moreover, the proposed density–contour clustering approach considers two types of densities: “absolute” density and “relative” density to identify event clusters where either there is a high density of event tweets or there is a high percentage of event tweets. We evaluate our approach using real-world data collected from Twitter, and the experimental results show that the proposed system can not only detect and track events effectively but also discover interesting patterns from geotagged tweets.

如今，Twitter已经成为发展最快的微博服务之一;因此，分析这些丰富且不断由用户生成的内容可以揭示出前所未有的有价值的知识。在本文中，我们提出了一种新的两阶段系统，通过集成基于潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)的方法和基于密度轮廓的高效时空聚类方法来检测和跟踪推文中的事件。在该系统中，我们首先将地理标记的tweet流划分为多个时间窗口;接下来，使用基于lda的主题发现步骤将事件识别为tweet中的主题;然后，为每条tweet分配一个事件标签;其次，采用基于密度轮廓的时空聚类方法对时空事件聚类进行识别。在我们的方法中，通过计算主题之间的kl -散度来建立主题连续性，通过一系列新制定的空间聚类距离函数来建立时空连续性。此外，提出的密度-轮廓聚类方法考虑了两种类型的密度:“绝对”密度和“相对”密度，以识别事件推文密度高或事件推文百分比高的事件聚类。我们使用从Twitter收集的真实数据来评估我们的方法，实验结果表明，所提出的系统不仅可以有效地检测和跟踪事件，还可以从地理标记的tweet中发现有趣的模式。

{"title":"Tracking Events in Twitter by Combining an LDA-Based Approach and a Density-Contour Clustering Approach","authors":"Yongli Zhang, C. Eick","doi":"10.1142/S1793351X19400051","DOIUrl":"https://doi.org/10.1142/S1793351X19400051","url":null,"abstract":"Nowadays, Twitter has become one of the fastest-growing microblogging services; consequently, analyzing this rich and continuously user-generated content can reveal unprecedentedly valuable knowledge. In this paper, we propose a novel two-stage system to detect and track events from tweets by integrating a Latent Dirichlet Allocation (LDA)-based approach and an efficient density–contour-based spatio-temporal clustering approach. In the proposed system, we first divide the geotagged tweet stream into temporal time windows; next, events are identified as topics in tweets using an LDA-based topic discovery step; then, each tweet is assigned an event label; next, a density–contour-based spatio-temporal clustering approach is employed to identify spatio-temporal event clusters. In our approach, topic continuity is established by calculating KL-divergences between topics and spatio-temporal continuity is established by a family of newly formulated spatial cluster distance functions. Moreover, the proposed density–contour clustering approach considers two types of densities: “absolute” density and “relative” density to identify event clusters where either there is a high density of event tweets or there is a high percentage of event tweets. We evaluate our approach using real-world data collected from Twitter, and the experimental results show that the proposed system can not only detect and track events effectively but also discover interesting patterns from geotagged tweets.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126145818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents 考虑视听内容一致性的多新闻视频摘要

Int. J. Semantic Comput.

Pub Date : 2019-04-03 DOI: 10.1142/S1793351X19500016

Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase

News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.

新闻视频是关于现实世界事件的有价值的多媒体信息。然而，由于内容的增量性，相关新闻主题的新闻视频序列可能是冗余和冗长的。因此，提出了一些方法来总结它们。然而，这些方法大多没有考虑到听觉和视觉内容之间的一致性。这在新闻视频中就成了一个问题，因为这两种内容并不总是来自同一来源。考虑到这一点，本文提出了一种考虑听觉和视觉内容一致性的新闻视频序列总结方法。该方法首先从序列中每个新闻故事的听觉内容(Closed Caption)中选择关键句，然后在新闻故事中选择从视觉内容中检测到的“视觉概念”与所选关键句最一致的镜头。最后，将每个关键句对应的音频片段与选定的镜头合成，然后将这些片段拼接成一个汇总视频。对几个新闻主题的视频摘要进行主观实验，结果表明了该方法的有效性。

{"title":"Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents","authors":"Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase","doi":"10.1142/S1793351X19500016","DOIUrl":"https://doi.org/10.1142/S1793351X19500016","url":null,"abstract":"News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126298741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Discriminative Subnetwork from fMRI-Based Complete Functional Connectivity Networks 基于fmri的完全功能连接网络鉴别子网络的识别

Int. J. Semantic Comput.

Pub Date : 2019-04-03 DOI: 10.1142/S1793351X19400026

S. M. Hamdi, Yubao Wu, R. Angryk, L. Krishnamurthy, R. Morris

The comprehensive set of neuronal connections of the human brain, which is known as the human connectomes, has provided valuable insight into neurological and neurodevelopmental disorders. Functional Magnetic Resonance Imaging (fMRI) has facilitated this research by capturing regionally specific brain activity. Resting state fMRI is used to extract the functional connectivity networks, which are edge-weighted complete graphs. In these complete functional connectivity networks, each node represents one brain region or Region of Interest (ROI), and each edge weight represents the strength of functional connectivity of the adjacent ROIs. In order to leverage existing graph mining methodologies, these complete graphs are often made sparse by applying thresholds on weights. This approach can result in loss of discriminative information while addressing the issue of biomarkers detection, i.e. finding discriminative ROIs and connections, given the data of healthy and disabled population. In this work, we demonstrate a novel framework for representing the complete functional connectivity networks in a threshold-free manner and identifying biomarkers by using feature selection algorithms. Additionally, to compute meaningful representations of the discriminative ROIs and connections, we apply tensor decomposition techniques. Experiments on a fMRI dataset of neurodevelopmental reading disabilities show the highly interpretable nature of our approach in finding the biomarkers of the diseases.

人类大脑中神经元连接的综合集合，即所谓的人类连接体，为神经和神经发育障碍提供了有价值的见解。功能性磁共振成像(fMRI)通过捕捉特定区域的大脑活动为这项研究提供了便利。静息状态fMRI用于提取边缘加权完全图的功能连接网络。在这些完整的功能连接网络中，每个节点代表一个大脑区域或感兴趣区域(ROI)，每个边权代表相邻ROI的功能连接强度。为了利用现有的图挖掘方法，这些完全图通常通过对权重应用阈值而变得稀疏。在处理生物标志物检测问题时，这种方法可能导致歧视性信息的丢失，即根据健康人口和残疾人口的数据找到歧视性roi和连接。在这项工作中，我们展示了一个新的框架，用于以无阈值的方式表示完整的功能连接网络，并通过使用特征选择算法识别生物标志物。此外，为了计算判别roi和连接的有意义表示，我们应用张量分解技术。在神经发育性阅读障碍的fMRI数据集上的实验表明，我们的方法在寻找疾病的生物标志物方面具有高度可解释性。

{"title":"Identification of Discriminative Subnetwork from fMRI-Based Complete Functional Connectivity Networks","authors":"S. M. Hamdi, Yubao Wu, R. Angryk, L. Krishnamurthy, R. Morris","doi":"10.1142/S1793351X19400026","DOIUrl":"https://doi.org/10.1142/S1793351X19400026","url":null,"abstract":"The comprehensive set of neuronal connections of the human brain, which is known as the human connectomes, has provided valuable insight into neurological and neurodevelopmental disorders. Functional Magnetic Resonance Imaging (fMRI) has facilitated this research by capturing regionally specific brain activity. Resting state fMRI is used to extract the functional connectivity networks, which are edge-weighted complete graphs. In these complete functional connectivity networks, each node represents one brain region or Region of Interest (ROI), and each edge weight represents the strength of functional connectivity of the adjacent ROIs. In order to leverage existing graph mining methodologies, these complete graphs are often made sparse by applying thresholds on weights. This approach can result in loss of discriminative information while addressing the issue of biomarkers detection, i.e. finding discriminative ROIs and connections, given the data of healthy and disabled population. In this work, we demonstrate a novel framework for representing the complete functional connectivity networks in a threshold-free manner and identifying biomarkers by using feature selection algorithms. Additionally, to compute meaningful representations of the discriminative ROIs and connections, we apply tensor decomposition techniques. Experiments on a fMRI dataset of neurodevelopmental reading disabilities show the highly interpretable nature of our approach in finding the biomarkers of the diseases.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125374972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Evolutionary Heuristic A* Search: Pathfinding Algorithm with Self-Designed and Optimized Heuristic Function 进化启发式A*搜索:自设计优化启发式函数的寻路算法

Int. J. Semantic Comput.

Pub Date : 2019-03-01 DOI: 10.1142/S1793351X19400014

Ying Fung Yiu, E. Du, R. Mahapatra

The performance and efficiency of A* search algorithm heavily depends on the quality of the heuristic function. Therefore, designing an optimal heuristic function becomes the primary goal of developing a search algorithm for specific domains in artificial intelligence. However, it is difficult to design a well-constructed heuristic function without careful consideration and trial-and-error, especially for complex pathfinding problems. The complexity of a heuristic function increases and becomes unmanageable to design when an increasing number of parameters are involved. Existing approaches often avoid complex heuristic function design: they either trade-off the accuracy for faster computation or taking advantage of the parallelism for better performance. The objective of this paper is to reduce the difficulty of complex heuristic function design for A* search algorithm. We aim to design an algorithm that can be automatically optimized to achieve rapid search with high accuracy and low computational cost. In this paper, we present a novel design and optimization method for a Multi-Weighted-Heuristics function (MWH) named Evolutionary Heuristic A* search (EHA*) to: (1) minimize the effort on heuristic function design via Genetic Algorithm (GA), (2) optimize the performance of A* search and its variants including but not limited to WA* and MHA*, and (3) guarantee the completeness and optimality. EHA* algorithm enables high performance searches and significantly simplifies the processing of heuristic design. We apply EHA* to multiple grid-based pathfinding benchmarks to evaluate the performance. Our experiment result shows that EHA* (1) is capable of choosing an accurate heuristic function that provides an optimal solution, (2) can identify and eliminate inefficient heuristics, (3) is able to automatically design multi-heuristics function, and (4) minimizes both the time and space complexity.

A*搜索算法的性能和效率在很大程度上取决于启发式函数的质量。因此，设计最优启发式函数成为开发特定领域人工智能搜索算法的首要目标。然而，如果没有仔细的考虑和试错，很难设计一个构造良好的启发式函数，特别是对于复杂的寻径问题。当涉及到越来越多的参数时，启发式函数的复杂性会增加，并且变得难以设计。现有的方法通常避免复杂的启发式函数设计:它们要么为了更快的计算而牺牲准确性，要么为了更好的性能而利用并行性。本文的目的是为了降低A*搜索算法中复杂启发式函数设计的难度。我们的目标是设计一种可以自动优化的算法，以实现快速、高精度和低计算成本的搜索。本文提出了一种新的多加权启发式函数(MWH)的设计和优化方法——进化启发式a *搜索(Evolutionary Heuristic a * search, EHA*)，以:(1)通过遗传算法(GA)最小化启发式函数设计的工作量;(2)优化a *搜索及其变体(包括但不限于WA*和MHA*)的性能;(3)保证其完备性和最优性。EHA*算法实现了高性能搜索，显著简化了启发式设计的处理。我们将EHA*应用于多个基于网格的寻路基准来评估性能。实验结果表明，EHA*(1)能够准确选择提供最优解的启发式函数，(2)能够识别和消除低效的启发式函数，(3)能够自动设计多启发式函数，(4)最小化时间和空间复杂度。

{"title":"Evolutionary Heuristic A* Search: Pathfinding Algorithm with Self-Designed and Optimized Heuristic Function","authors":"Ying Fung Yiu, E. Du, R. Mahapatra","doi":"10.1142/S1793351X19400014","DOIUrl":"https://doi.org/10.1142/S1793351X19400014","url":null,"abstract":"The performance and efficiency of A* search algorithm heavily depends on the quality of the heuristic function. Therefore, designing an optimal heuristic function becomes the primary goal of developing a search algorithm for specific domains in artificial intelligence. However, it is difficult to design a well-constructed heuristic function without careful consideration and trial-and-error, especially for complex pathfinding problems. The complexity of a heuristic function increases and becomes unmanageable to design when an increasing number of parameters are involved. Existing approaches often avoid complex heuristic function design: they either trade-off the accuracy for faster computation or taking advantage of the parallelism for better performance. The objective of this paper is to reduce the difficulty of complex heuristic function design for A* search algorithm. We aim to design an algorithm that can be automatically optimized to achieve rapid search with high accuracy and low computational cost. In this paper, we present a novel design and optimization method for a Multi-Weighted-Heuristics function (MWH) named Evolutionary Heuristic A* search (EHA*) to: (1) minimize the effort on heuristic function design via Genetic Algorithm (GA), (2) optimize the performance of A* search and its variants including but not limited to WA* and MHA*, and (3) guarantee the completeness and optimality. EHA* algorithm enables high performance searches and significantly simplifies the processing of heuristic design. We apply EHA* to multiple grid-based pathfinding benchmarks to evaluate the performance. Our experiment result shows that EHA* (1) is capable of choosing an accurate heuristic function that provides an optimal solution, (2) can identify and eliminate inefficient heuristics, (3) is able to automatically design multi-heuristics function, and (4) minimizes both the time and space complexity.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"569 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding 用于人群场景理解的卷积非线性微分递归神经网络

Int. J. Semantic Comput.

Pub Date : 2018-12-01 DOI: 10.1142/S1793351X18400196

Naifan Zhuang, T. Kieu, Jun Ye, K. Hua

With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.

随着现实世界中人群现象的增多，人群场景理解正成为异常检测和公共安全领域的一项重要任务。然而，视觉模糊和遮挡、高密度、低移动性和场景语义使这个问题成为一个巨大的挑战。在本文中，我们提出了一个端到端的深度架构，卷积非线性微分递归神经网络(CNDRNNs)，用于人群场景理解。cndrnn包括GoogleNet Inception V3卷积神经网络(cnn)和非线性微分递归神经网络(rnn)。不同于传统的非端到端解决方案将特征提取和参数学习的步骤分开，CNDRNN采用统一的深度模型，同时对CNN和RNN的参数进行优化。因此，它有可能产生一个更和谐的模式。该结构采用连续的原始图像数据作为输入，不依赖于轨迹检测。因此，与传统的基于流量和基于轨迹的方法相比，它具有明显的优势，特别是在高密度和低流动性的挑战性人群场景中。CNDRNN结合CNN和RNN，可以有效地分析人群语义。具体来说，CNN擅长对语义人群场景信息进行建模。另一方面，非线性微分RNN对运动信息进行建模。微分RNN中状态导数(DoS)的单个阶数和不断增加的阶数可以逐步增强长短期记忆门(LSTM)门检测不同层次显著动态模式的能力。最后，现有的基于lstm的人群场景解决方案探索深度时间信息，并声称是“深度时间”。然而，我们提出的CNDRNN方法在一个统一的架构中对空间和时间信息进行建模，并实现了“在空间和时间上的深度”。对Violent-Flows、中大人群和NUS-HGA数据集的广泛性能研究表明，所提出的技术明显优于最先进的方法。

{"title":"Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding","authors":"Naifan Zhuang, T. Kieu, Jun Ye, K. Hua","doi":"10.1142/S1793351X18400196","DOIUrl":"https://doi.org/10.1142/S1793351X18400196","url":null,"abstract":"With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128621769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Semantic Plug and Play - Self-Descriptive Modular Hardware for Robotic Applications 语义即插即用——用于机器人应用的自描述模块化硬件

Int. J. Semantic Comput.

Pub Date : 2018-12-01 DOI: 10.1142/S1793351X18500058

Christian Eymüller, Constantin Wanninger, A. Hoffmann, W. Reif

This paper describes the use of semantically annotated data for the expression of sensors and actuators with their properties and capabilities. For this purpose, a plug and play mechanism is presented in order to exchange self-descriptions between several hardware devices and then use the established information for the execution of capabilities. For the combination of different capabilities distributed on multiple hardware devices an architecture is presented to link abstract capabilities. These abstract capabilities are initialized through concrete capabilities of the discovered hardware devices by the plug and play mechanism.

本文描述了使用语义标注数据来表达传感器和执行器的特性和能力。为此，提出了一种即插即用机制，以便在多个硬件设备之间交换自我描述，然后使用已建立的信息执行功能。对于分布在多个硬件设备上的不同功能的组合，提出了一种连接抽象功能的体系结构。通过即插即用机制，通过所发现硬件设备的具体功能来初始化这些抽象功能。

引用次数: 5

Recurrent Visual Relationship Recognition with Triplet Unit for Diversity 基于多元三元单元的循环视觉关系识别

Int. J. Semantic Comput.

Pub Date : 2018-12-01 DOI: 10.1142/S1793351X18400214

Kento Masui, A. Ochiai, Shintaro Yoshizawa, Hideki Nakayama

The task of visual relationship recognition (VRR) is to recognize multiple objects and their relationships in an image. A fundamental difficulty of this task is class–number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these challenges, this paper proposes a novel architecture with a recurrent neural network (RNN) and triplet unit (TU). The RNN allows our model to be optimized for outputting a sequence of relationships. By optimizing our model to a semantically diverse relationship sequence, we increase the variety in output relationships. At each step of the RNN, our TU enables the model to classify a relationship while achieving class–number scalability by decomposing a relationship into a subject–predicate–object (SPO) triplet. We evaluate our model on various datasets and compare the results to a baseline. These experimental results show our model’s superior recall and precision with fewer predictions compared to the baseline, even as it produces greater variety in relationships.

视觉关系识别(VRR)的任务是识别图像中的多个对象及其关系。这个任务的一个基本困难是类-数的可伸缩性，因为我们需要考虑的可能关系的数量会导致组合爆炸。该任务的另一个难点是如何避免输出语义冗余关系的建模。为了克服这些挑战，本文提出了一种新的递归神经网络(RNN)和三重单元(TU)结构。RNN允许我们的模型进行优化，以输出一系列关系。通过将我们的模型优化为语义多样化的关系序列，我们增加了输出关系的多样性。在RNN的每个步骤中，我们的TU使模型能够对关系进行分类，同时通过将关系分解为主题-谓词-对象(SPO)三元组来实现类数可扩展性。我们在不同的数据集上评估我们的模型，并将结果与基线进行比较。这些实验结果表明，与基线相比，我们的模型在预测更少的情况下具有更高的召回率和精确度，即使它在关系中产生了更多的变化。

{"title":"Recurrent Visual Relationship Recognition with Triplet Unit for Diversity","authors":"Kento Masui, A. Ochiai, Shintaro Yoshizawa, Hideki Nakayama","doi":"10.1142/S1793351X18400214","DOIUrl":"https://doi.org/10.1142/S1793351X18400214","url":null,"abstract":"The task of visual relationship recognition (VRR) is to recognize multiple objects and their relationships in an image. A fundamental difficulty of this task is class–number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these challenges, this paper proposes a novel architecture with a recurrent neural network (RNN) and triplet unit (TU). The RNN allows our model to be optimized for outputting a sequence of relationships. By optimizing our model to a semantically diverse relationship sequence, we increase the variety in output relationships. At each step of the RNN, our TU enables the model to classify a relationship while achieving class–number scalability by decomposing a relationship into a subject–predicate–object (SPO) triplet. We evaluate our model on various datasets and compare the results to a baseline. These experimental results show our model’s superior recall and precision with fewer predictions compared to the baseline, even as it produces greater variety in relationships.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121390238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

KaraMIR: A Project for Cover Song Identification and Singing Voice Analysis Using a Karaoke Songs Dataset KaraMIR:一个使用卡拉ok歌曲数据集进行翻唱歌曲识别和歌声分析的项目

Int. J. Semantic Comput.

Pub Date : 2018-12-01 DOI: 10.1142/S1793351X18400202

Ladislav Marsik, Petr Martisek, J. Pokorný, M. Rusek, K. Slaninová, J. Martinovič, Matthias Robine, P. Hanna, Yann Bayle

We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.

我们介绍KaraMIR，一个专门分析卡拉ok歌曲的音乐项目。在KaraMIR中，我们定义了Kara1k，这是一个由Recisio Karafun应用程序提供的1000首翻唱歌曲和原始艺术家相应的1000首歌曲组成的数据集。Kara1k主要致力于翻唱歌曲识别和歌声分析。对于这两项任务，Kara1k提供了新颖的方法，因为每首翻唱歌曲都是录音室录制的歌曲，与原始录音的编曲相同，但演唱者和音乐家不同。Essentia，和声分析器，Marsyas, Vamp插件和YAAFE已被用于提取每个轨道的音频特征。我们提供元数据，如标题、流派、原创艺术家、年份、国际标准录音代码以及歌手性别、伴唱、二重唱和歌词语言的基本真相。KaraMIR项目侧重于定义新问题，并描述解决这些问题的特性和工具。因此，我们使用统计方法对翻唱歌曲识别任务的传统特征和新特征进行了比较，并对色度、MFCC、和弦、键和和弦距离特征进行了动态时间翘曲方法。提出了歌手性别分类任务的支持实验。KaraMIR项目网站为持续研究提供了便利。

{"title":"KaraMIR: A Project for Cover Song Identification and Singing Voice Analysis Using a Karaoke Songs Dataset","authors":"Ladislav Marsik, Petr Martisek, J. Pokorný, M. Rusek, K. Slaninová, J. Martinovič, Matthias Robine, P. Hanna, Yann Bayle","doi":"10.1142/S1793351X18400202","DOIUrl":"https://doi.org/10.1142/S1793351X18400202","url":null,"abstract":"We introduce KaraMIR, a musical project dedicated to karaoke song analysis. Within KaraMIR, we define Kara1k, a dataset composed of 1000 cover songs provided by Recisio Karafun application, and the corresponding 1000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, Kara1k offers novel approaches, as each cover song is a studio-recorded song with the same arrangement as the original recording, but with different singers and musicians. Essentia, harmony-analyser, Marsyas, Vamp plugins and YAAFE have been used to extract audio features for each track in Kara1k. We provide metadata such as the title, genre, original artist, year, International Standard Recording Code and the ground truths for the singer’s gender, backing vocals, duets, and lyrics’ language. KaraMIR project focuses on defining new problems and describing features and tools to solve them. We thus provide a comparison of traditional and new features for a cover song identification task using statistical methods, as well as the dynamic time warping method on chroma, MFCC, chords, keys, and chord distance features. A supporting experiment on the singer gender classification task is also proposed. The KaraMIR project website facilitates the continuous research.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132828540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Guest Editors' Introduction 特邀编辑介绍

Int. J. Semantic Comput.

Pub Date : 2018-12-01 DOI: 10.1142/S0218194011005207

A. Celentano, A. Yoshitaka

引用次数: 0