Creating annotated frame lexicons such as PropBank and FrameNet is expensive and labor intensive. We present a method to induce an embedded frame lexicon in an minimally supervised fashion using nothing more than unlabeled predicate-argument word pairs. We hypothesize that aggregating such pair selectional preferences across training leads us to a global understanding that captures predicate-argument frame structure. Our approach revolves around a novel integration between a predictive embedding model and an Indian Buffet Process posterior regularizer. We show, through our experimental evaluation, that we outperform baselines on two tasks and can learn an embedded frame lexicon that is able to capture some interesting generalities in relation to hand-crafted semantic frames.
{"title":"Embedded Semantic Lexicon Induction with Joint Global and Local Optimization","authors":"S. Jauhar, E. Hovy","doi":"10.18653/v1/S17-1025","DOIUrl":"https://doi.org/10.18653/v1/S17-1025","url":null,"abstract":"Creating annotated frame lexicons such as PropBank and FrameNet is expensive and labor intensive. We present a method to induce an embedded frame lexicon in an minimally supervised fashion using nothing more than unlabeled predicate-argument word pairs. We hypothesize that aggregating such pair selectional preferences across training leads us to a global understanding that captures predicate-argument frame structure. Our approach revolves around a novel integration between a predictive embedding model and an Indian Buffet Process posterior regularizer. We show, through our experimental evaluation, that we outperform baselines on two tasks and can learn an embedded frame lexicon that is able to capture some interesting generalities in relation to hand-crafted semantic frames.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132658086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiword expressions (MWEs) are lexical items that can be decomposed into multiple component words, but have properties that are unpredictable with respect to their component words. In this paper we propose the first deep learning models for token-level identification of MWEs. Specifically, we consider a layered feedforward network, a recurrent neural network, and convolutional neural networks. In experimental results we show that convolutional neural networks are able to outperform the previous state-of-the-art for MWE identification, with a convolutional neural network with three hidden layers giving the best performance.
{"title":"Deep Learning Models For Multiword Expression Identification","authors":"W. Gharbieh, V. Bhavsar, Paul Cook","doi":"10.18653/v1/S17-1006","DOIUrl":"https://doi.org/10.18653/v1/S17-1006","url":null,"abstract":"Multiword expressions (MWEs) are lexical items that can be decomposed into multiple component words, but have properties that are unpredictable with respect to their component words. In this paper we propose the first deep learning models for token-level identification of MWEs. Specifically, we consider a layered feedforward network, a recurrent neural network, and convolutional neural networks. In experimental results we show that convolutional neural networks are able to outperform the previous state-of-the-art for MWE identification, with a convolutional neural network with three hidden layers giving the best performance.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128651915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce WHiC, a challenging testbed for detecting hypernymy, an asymmetric relation between words. While previous work has focused on detecting hypernymy between word types, we ground the meaning of words in specific contexts drawn from WordNet examples, and require predictions to be sensitive to changes in contexts. WHiC lets us analyze complementary properties of two approaches of inducing vector representations of word meaning in context. We show that such contextualized word representations also improve detection of a wider range of semantic relations in context.
{"title":"Detecting Asymmetric Semantic Relations in Context: A Case-Study on Hypernymy Detection","authors":"Yogarshi Vyas, Marine Carpuat","doi":"10.18653/v1/S17-1004","DOIUrl":"https://doi.org/10.18653/v1/S17-1004","url":null,"abstract":"We introduce WHiC, a challenging testbed for detecting hypernymy, an asymmetric relation between words. While previous work has focused on detecting hypernymy between word types, we ground the meaning of words in specific contexts drawn from WordNet examples, and require predictions to be sensitive to changes in contexts. WHiC lets us analyze complementary properties of two approaches of inducing vector representations of word meaning in context. We show that such contextualized word representations also improve detection of a wider range of semantic relations in context.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128345535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angel Maredia, Kara Schechtman, Sarah Ita Levitan, Julia Hirschberg
Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.
{"title":"Comparing Approaches for Automatic Question Identification","authors":"Angel Maredia, Kara Schechtman, Sarah Ita Levitan, Julia Hirschberg","doi":"10.18653/v1/S17-1013","DOIUrl":"https://doi.org/10.18653/v1/S17-1013","url":null,"abstract":"Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128268129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Humans as well as animals are good at imitation. Inspired by this, the learning by demonstration view of machine learning learns to perform a task from detailed example demonstrations. In this paper, we introduce the task of question answering using natural language demonstrations where the question answering system is provided with detailed demonstrative solutions to questions in natural language. As a case study, we explore the task of learning to solve geometry problems using demonstrative solutions available in textbooks. We collect a new dataset of demonstrative geometry solutions from textbooks and explore approaches that learn to interpret these demonstrations as well as to use these interpretations to solve geometry problems. Our approaches show improvements over the best previously published system for solving geometry problems.
{"title":"Learning to Solve Geometry Problems from Natural Language Demonstrations in Textbooks","authors":"Mrinmaya Sachan, E. Xing","doi":"10.18653/v1/S17-1029","DOIUrl":"https://doi.org/10.18653/v1/S17-1029","url":null,"abstract":"Humans as well as animals are good at imitation. Inspired by this, the learning by demonstration view of machine learning learns to perform a task from detailed example demonstrations. In this paper, we introduce the task of question answering using natural language demonstrations where the question answering system is provided with detailed demonstrative solutions to questions in natural language. As a case study, we explore the task of learning to solve geometry problems using demonstrative solutions available in textbooks. We collect a new dataset of demonstrative geometry solutions from textbooks and explore approaches that learn to interpret these demonstrations as well as to use these interpretations to solve geometry problems. Our approaches show improvements over the best previously published system for solving geometry problems.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132489643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ekaterina Shutova, Andreas Wundsam, H. Yannakoudakis
Frame-semantic parsing and semantic role labelling, that aim to automatically assign semantic roles to arguments of verbs in a sentence, have become an active strand of research in NLP. However, to date these methods have relied on a predefined inventory of semantic roles. In this paper, we present a method to automatically learn argument role inventories for verbs from large corpora of text, images and videos. We evaluate the method against manually constructed role inventories in FrameNet and show that the visual model outperforms the language-only model and operates with a high precision.
{"title":"Semantic Frames and Visual Scenes: Learning Semantic Role Inventories from Image and Video Descriptions","authors":"Ekaterina Shutova, Andreas Wundsam, H. Yannakoudakis","doi":"10.18653/v1/S17-1018","DOIUrl":"https://doi.org/10.18653/v1/S17-1018","url":null,"abstract":"Frame-semantic parsing and semantic role labelling, that aim to automatically assign semantic roles to arguments of verbs in a sentence, have become an active strand of research in NLP. However, to date these methods have relied on a predefined inventory of semantic roles. In this paper, we present a method to automatically learn argument role inventories for verbs from large corpora of text, images and videos. We evaluate the method against manually constructed role inventories in FrameNet and show that the visual model outperforms the language-only model and operates with a high precision.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Marie, Fabien L. Gandon, A. Giboin, Émilie Palagi
A promising scenario for combining linked data and search is exploratory search. During exploratory search, the search objective is ill-defined and favorable to discovery. A common limit of the existing linked data based exploratory search systems is that they constrain the exploration through single results selection and ranking schemes. The users can not influence the results to reveal specific aspects of knowledge that interest them. The models and algorithms we propose unveil such knowledge nuances by allowing the exploration of topics through several perspectives. The users adjust important computation parameters through three operations that help retrieving desired exploration perspectives: specification of interest criteria about the topic explored, controlled randomness injection to reveal unexpected knowledge and choice of the processed knowledge source(s). This paper describes the corresponding models, algorithms and the Discovery Hub implementation. It focuses on the three mentioned operations and presents their evaluations.
{"title":"Exploratory search on topics through different perspectives with DBpedia","authors":"Nicolas Marie, Fabien L. Gandon, A. Giboin, Émilie Palagi","doi":"10.1145/2660517.2660518","DOIUrl":"https://doi.org/10.1145/2660517.2660518","url":null,"abstract":"A promising scenario for combining linked data and search is exploratory search. During exploratory search, the search objective is ill-defined and favorable to discovery. A common limit of the existing linked data based exploratory search systems is that they constrain the exploration through single results selection and ranking schemes. The users can not influence the results to reveal specific aspects of knowledge that interest them. The models and algorithms we propose unveil such knowledge nuances by allowing the exploration of topics through several perspectives. The users adjust important computation parameters through three operations that help retrieving desired exploration perspectives: specification of interest criteria about the topic explored, controlled randomness injection to reveal unexpected knowledge and choice of the processed knowledge source(s). This paper describes the corresponding models, algorithms and the Discovery Hub implementation. It focuses on the three mentioned operations and presents their evaluations.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"18 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128385857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Brümmer, C. Baron, I. Ermilov, M. Freudenberg, D. Kontokostas, Sebastian Hellmann
The constantly growing amount of Linked Open Data (LOD) datasets constitutes the need for rich metadata descriptions, enabling users to discover, understand and process the available data. This metadata is often created, maintained and stored in diverse data repositories featuring disparate data models that are often unable to provide the metadata necessary to automatically process the datasets described. This paper proposes DataID, a best-practice for LOD dataset descriptions which utilize RDF files hosted together with the datasets, under the same domain. We are describing the data model, which is based on the widely used DCAT and VoID vocabularies, as well as supporting tools to create and publish DataIDs and use cases that show the benefits of providing semantically rich metadata for complex datasets. As a proof of concept, we generated a DataID for the DBpedia dataset, which we will present in the paper.
{"title":"DataID: towards semantically rich metadata for complex datasets","authors":"Martin Brümmer, C. Baron, I. Ermilov, M. Freudenberg, D. Kontokostas, Sebastian Hellmann","doi":"10.1145/2660517.2660538","DOIUrl":"https://doi.org/10.1145/2660517.2660538","url":null,"abstract":"The constantly growing amount of Linked Open Data (LOD) datasets constitutes the need for rich metadata descriptions, enabling users to discover, understand and process the available data. This metadata is often created, maintained and stored in diverse data repositories featuring disparate data models that are often unable to provide the metadata necessary to automatically process the datasets described. This paper proposes DataID, a best-practice for LOD dataset descriptions which utilize RDF files hosted together with the datasets, under the same domain. We are describing the data model, which is based on the widely used DCAT and VoID vocabularies, as well as supporting tools to create and publish DataIDs and use cases that show the benefits of providing semantically rich metadata for complex datasets. As a proof of concept, we generated a DataID for the DBpedia dataset, which we will present in the paper.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the paper we show the development of the Music Business Ontology (MBO). The MBO was developed in reaction to problems towards data and communication in the music industry. Based on a qualitative pre-study we analyzed the music industry, its players and data and software in use. First, we identified typical services and data formats. Consequently, we extracted concepts and properties from the music business. The development of software tools for the music business serving well-defined tasks followed the design of the ontology. As a result, the MBO increases transparency of the music business as well as it serves for a better understanding of the music business itself among its actors. The introduction of the Music Business Ontology changes the way actors and systems in the music business interact with each other. It decreases the need for different interfaces and formats and thus considerably reduces complexity.
{"title":"Semantics for the music industry: the development of the music business ontology (MBO)","authors":"Frank Schumacher, R. Gey, Stephan Klingner","doi":"10.1145/2660517.2660531","DOIUrl":"https://doi.org/10.1145/2660517.2660531","url":null,"abstract":"In the paper we show the development of the Music Business Ontology (MBO). The MBO was developed in reaction to problems towards data and communication in the music industry. Based on a qualitative pre-study we analyzed the music industry, its players and data and software in use. First, we identified typical services and data formats. Consequently, we extracted concepts and properties from the music business. The development of software tools for the music business serving well-defined tasks followed the design of the ontology. As a result, the MBO increases transparency of the music business as well as it serves for a better understanding of the music business itself among its actors. The introduction of the Music Business Ontology changes the way actors and systems in the music business interact with each other. It decreases the need for different interfaces and formats and thus considerably reduces complexity.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129155390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The EPCIS specification provides an event oriented mechanism to record product movement information across stakeholders in supply chain business processes. Besides enabling the sharing of event-based traceability datasets, track and trace implementations must also be equipped with the capabilities to validate integrity constraints and detect runtime exceptions without compromising the time-to-deliver schedule of the shipping and receiving parties. In this paper we present a methodology for detecting exceptions arising during the processing of EPCIS event datasets. We propose an extension to the EEM ontology for modelling EPCIS exceptions and show how runtime exceptions can be detected and reported. We exemplify and evaluate our approach on an abstraction of pharmaceutical supply chains.
{"title":"Detecting EPCIS exceptions in linked traceability streams across supply chain business processes","authors":"M. Solanki, C. Brewster","doi":"10.1145/2660517.2660524","DOIUrl":"https://doi.org/10.1145/2660517.2660524","url":null,"abstract":"The EPCIS specification provides an event oriented mechanism to record product movement information across stakeholders in supply chain business processes. Besides enabling the sharing of event-based traceability datasets, track and trace implementations must also be equipped with the capabilities to validate integrity constraints and detect runtime exceptions without compromising the time-to-deliver schedule of the shipping and receiving parties. In this paper we present a methodology for detecting exceptions arising during the processing of EPCIS event datasets. We propose an extension to the EEM ontology for modelling EPCIS exceptions and show how runtime exceptions can be detected and reported. We exemplify and evaluate our approach on an abstraction of pharmaceutical supply chains.","PeriodicalId":344435,"journal":{"name":"Joint Conference on Lexical and Computational Semantics","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128093444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}