Fusing domain-specific data with general data for in-domain applications

An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen
{"title":"Fusing domain-specific data with general data for in-domain applications","authors":"An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen","doi":"10.1145/3106426.3106473","DOIUrl":null,"url":null,"abstract":"This paper analyzes the lexical semantics of domain-specific terms based on various pre-trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper analyzes the lexical semantics of domain-specific terms based on various pre-trained specific domain and general domain word vectors, and addresses the semantic drift between domains. To capture lexical semantics in the specific domain, we propose a bridge mechanism to introduce domain-specific data into general data, and re-train word vectors. We find that even a small-scale fusion can result in the similar lexical semantics learned by using the large-scale domain-specific dataset. Experiments on sentiment analysis and outlier detection show that application of word embedding by the fusion dataset has the better performance than applications of word embeddings by pure large domain-specific and pure large general datasets. The simple, but effective methodology facilitates the domain adaptation of distributed word representations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将特定于领域的数据与域内应用程序的通用数据融合
本文基于各种预训练的特定领域和一般领域词向量,分析了领域特定术语的词汇语义,并解决了领域之间的语义漂移问题。为了捕获特定领域的词汇语义,我们提出了一种桥接机制,将特定领域的数据引入到一般数据中,并重新训练词向量。我们发现,即使是小规模的融合也能产生与使用大规模特定领域数据集学习到的相似的词汇语义。情感分析和离群点检测实验表明,融合数据集的词嵌入应用比纯大型特定领域和纯大型通用数据集的词嵌入应用具有更好的性能。这种简单而有效的方法促进了分布式词表示的领域适应。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
WIMS 2020: The 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France, June 30 - July 3, 2020 A deep learning approach for web service interactions Partial sums-based P-Rank computation in information networks Mining ordinal data under human response uncertainty Haste makes waste: a case to favour voting bots
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1