Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning

V. Zheng, K. Chang
{"title":"Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning","authors":"V. Zheng, K. Chang","doi":"10.1145/2983323.2983860","DOIUrl":null,"url":null,"abstract":"Constraints have been shown as an effective way to incorporate unlabeled data for semi-supervised structured classification. We recognize that, constraints are often conditional and probabilistic; moreover, a constraint can have its condition depend on either just observations (which we call x-type constraint) or even hidden variables (which we call y-type constraint). We wish to design a constraint formulation that can flexibly model the constraint probability for both x-type and y-type constraints, and later use it to regularize general structured classifiers for semi-supervision. Surprisingly, none of the existing models have such a constraint formulation. Thus in this paper, we propose a new conditional probabilistic formulation for modeling both x-type and y-type constraints. We also recognize the inference complication for y-type constraint, and propose a systematic selective evaluation approach to efficiently realize the constraints. Finally, we evaluate our model in three applications, including named entity recognition, part-of-speech tagging and entity information extraction, with totally nine data sets. We show that our model is generally more accurate and efficient than the state-of-the-art baselines. Our code and data are available at https://bitbucket.org/vwz/cikm2016-cpf/.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Constraints have been shown as an effective way to incorporate unlabeled data for semi-supervised structured classification. We recognize that, constraints are often conditional and probabilistic; moreover, a constraint can have its condition depend on either just observations (which we call x-type constraint) or even hidden variables (which we call y-type constraint). We wish to design a constraint formulation that can flexibly model the constraint probability for both x-type and y-type constraints, and later use it to regularize general structured classifiers for semi-supervision. Surprisingly, none of the existing models have such a constraint formulation. Thus in this paper, we propose a new conditional probabilistic formulation for modeling both x-type and y-type constraints. We also recognize the inference complication for y-type constraint, and propose a systematic selective evaluation approach to efficiently realize the constraints. Finally, we evaluate our model in three applications, including named entity recognition, part-of-speech tagging and entity information extraction, with totally nine data sets. We show that our model is generally more accurate and efficient than the state-of-the-art baselines. Our code and data are available at https://bitbucket.org/vwz/cikm2016-cpf/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于条件概率约束的半监督学习正则化结构化分类器
约束已被证明是将未标记数据纳入半监督结构化分类的有效方法。我们认识到,约束通常是有条件的和概率的;此外,约束的条件可以只依赖于观察值(我们称之为x型约束),也可以依赖于隐藏变量(我们称之为y型约束)。我们希望设计一种约束公式,可以灵活地对x型和y型约束的约束概率进行建模,然后将其用于正则化一般结构化分类器进行半监督。令人惊讶的是,现有的模型都没有这样的约束公式。因此,在本文中,我们提出了一个新的条件概率公式来建模x型和y型约束。认识到y型约束的推理复杂性,提出了一种系统的选择性评价方法来有效地实现约束。最后,我们在命名实体识别、词性标注和实体信息提取三个方面对模型进行了评估,共使用了9个数据集。我们表明,我们的模型通常比最先进的基线更准确和有效。我们的代码和数据可在https://bitbucket.org/vwz/cikm2016-cpf/上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model Approximate Discovery of Functional Dependencies for Large Datasets Mining Shopping Patterns for Divergent Urban Regions by Incorporating Mobility Data A Personal Perspective and Retrospective on Web Search Technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1