首页 > 最新文献

VS@HLT-NAACL最新文献

英文 中文
Short Text Clustering via Convolutional Neural Networks 基于卷积神经网络的短文本聚类
Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1509
Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao
Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a locality-preserving constraint. Then, word embed-dings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.
随着社交媒体的普及,短文本聚类已成为一项越来越重要的任务,但由于文本表示的稀疏性,短文本聚类是一个具有挑战性的问题。在本文中,我们提出了一种基于卷积神经网络(简称STCC)的短文本聚类方法,该方法通过自学框架考虑对学习特征的一个约束,而不使用任何外部标签/标签,从而更有利于聚类。首先,我们将原始关键字特征嵌入到具有位置保持约束的紧凑二进制码中。然后,研究词嵌入并将其输入卷积神经网络以学习深度特征表示,在训练过程中输出单元拟合预训练的二进制代码。在获得学习到的表示后,我们使用K-means对它们进行聚类。我们在两个公开的短文本数据集上的广泛实验研究表明,通过我们的方法学习的深度特征表示可以获得比其他一些现有特征(如词频率-逆文档频率,拉普拉斯特征向量和平均嵌入)更好的聚类性能。
{"title":"Short Text Clustering via Convolutional Neural Networks","authors":"Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao","doi":"10.3115/v1/W15-1509","DOIUrl":"https://doi.org/10.3115/v1/W15-1509","url":null,"abstract":"Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a locality-preserving constraint. Then, word embed-dings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing 序列标注的词嵌入与词类型:CV解析的奇特案例
Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1517
Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru
We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.
本文通过应用词嵌入在自然语言处理(NLP)中的最新研究,探索了改进德文文档简历(CV)解析的新方法。我们的方法集成了词嵌入作为依赖于条件随机场(CRF)框架的概率序列标记模型的输入特征。表现最好的词嵌入是从大量德国简历样本中生成的。该模型将词嵌入与许多手工特征相结合,在提取任务中获得了最好的结果。这些改进在目标文档的不同部分是一致的。词嵌入对半结构化、样本外数据的影响是最强的。
{"title":"Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing","authors":"Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru","doi":"10.3115/v1/W15-1517","DOIUrl":"https://doi.org/10.3115/v1/W15-1517","url":null,"abstract":"We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"410 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126689825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study. 连续词表示的词法句法规律:多语言研究。
Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1518
Garrett Nicolai, Colin Cherry, Grzegorz Kondrak
We replicate the syntactic experiments of Mikolov et al. (2013b) on English, and expand them to include morphologically complex languages. We learn vector representations for Dutch, French, German, and Spanish with the WORD2VEC tool, and investigate to what extent inflectional information is preserved across vectors. We observe that the accuracy of vectors on a set of syntactic analogies is inversely correlated with the morphological complexity of the language.
我们复制了Mikolov等人(2013b)在英语上的句法实验,并将其扩展到包括形态复杂的语言。我们使用WORD2VEC工具学习荷兰语、法语、德语和西班牙语的向量表示,并研究跨向量保留屈折信息的程度。我们观察到,在一组句法类比上向量的准确性与语言的形态复杂性呈负相关。
{"title":"Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study.","authors":"Garrett Nicolai, Colin Cherry, Grzegorz Kondrak","doi":"10.3115/v1/W15-1518","DOIUrl":"https://doi.org/10.3115/v1/W15-1518","url":null,"abstract":"We replicate the syntactic experiments of Mikolov et al. (2013b) on English, and expand them to include morphologically complex languages. We learn vector representations for Dutch, French, German, and Spanish with the WORD2VEC tool, and investigate to what extent inflectional information is preserved across vectors. We observe that the accuracy of vectors on a set of syntactic analogies is inversely correlated with the morphological complexity of the language.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133528913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Relation Extraction: Perspective from Convolutional Neural Networks 关系抽取:卷积神经网络的视角
Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1506
Thien Huu Nguyen, R. Grishman
Up to now, relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches with complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a non-static architecture to improve the performance. We emphasize the relation extraction problem with an unbalanced corpus. The experimental results show that our system significantly outperforms not only the best baseline systems for relation extraction but also the state-of-the-art systems for relation classification.
到目前为止,关系提取系统已经大量使用了语言分析模块生成的特征。这些特征的错误会导致关系检测和分类的错误。在这项工作中,我们通过引入卷积神经网络来自动从句子中学习特征,并最大限度地减少对外部工具包和资源的依赖,从而摆脱了这些复杂特征工程的传统方法。我们的模型利用过滤器的多个窗口大小和预训练的词嵌入作为非静态架构的初始化器来提高性能。重点讨论了不平衡语料库下的关系抽取问题。实验结果表明,我们的系统不仅在关系提取方面明显优于最好的基线系统,而且在关系分类方面也优于最先进的系统。
{"title":"Relation Extraction: Perspective from Convolutional Neural Networks","authors":"Thien Huu Nguyen, R. Grishman","doi":"10.3115/v1/W15-1506","DOIUrl":"https://doi.org/10.3115/v1/W15-1506","url":null,"abstract":"Up to now, relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches with complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a non-static architecture to improve the performance. We emphasize the relation extraction problem with an unbalanced corpus. The experimental results show that our system significantly outperforms not only the best baseline systems for relation extraction but also the state-of-the-art systems for relation classification.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"11 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124258157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 464
Semantic Information Extraction for Improved Word Embeddings 改进词嵌入的语义信息提取
Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1523
Jiaqiang Chen, Gerard de Melo
Word embeddings have recently proven useful in a number of different applications that deal with natural language. Such embeddings succinctly reflect semantic similarities between words based on their sentence-internal contexts in large corpora. In this paper, we show that information extraction techniques provide valuable additional evidence of semantic relationships that can be exploited when producing word embeddings. We propose a joint model to train word embeddings both on regular context information and on more explicit semantic extractions. The word vectors obtained from such an augmented joint training show improved results on word similarity tasks, suggesting that they can be useful in applications that involve word meanings.
词嵌入最近在处理自然语言的许多不同应用程序中被证明是有用的。在大型语料库中,这种嵌入可以简洁地反映出基于句子内部上下文的词之间的语义相似性。在本文中,我们展示了信息提取技术提供了有价值的语义关系的额外证据,这些证据可以在产生词嵌入时被利用。我们提出了一个联合模型,在常规上下文信息和更明确的语义提取上训练词嵌入。从这种增强联合训练中获得的词向量在词相似度任务上显示出改进的结果,这表明它们在涉及词义的应用中是有用的。
{"title":"Semantic Information Extraction for Improved Word Embeddings","authors":"Jiaqiang Chen, Gerard de Melo","doi":"10.3115/v1/W15-1523","DOIUrl":"https://doi.org/10.3115/v1/W15-1523","url":null,"abstract":"Word embeddings have recently proven useful in a number of different applications that deal with natural language. Such embeddings succinctly reflect semantic similarities between words based on their sentence-internal contexts in large corpora. In this paper, we show that information extraction techniques provide valuable additional evidence of semantic relationships that can be exploited when producing word embeddings. We propose a joint model to train word embeddings both on regular context information and on more explicit semantic extractions. The word vectors obtained from such an augmented joint training show improved results on word similarity tasks, suggesting that they can be useful in applications that involve word meanings.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129471683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
VS@HLT-NAACL
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1