基于度量学习的手写体数字识别及其在语音通信平台拓展中的应用

D. Pant, Dibyendu Talukder, D. Kumar, Rachit Pandey, Aaditeshwar Seth, Chetan Arora
{"title":"基于度量学习的手写体数字识别及其在语音通信平台拓展中的应用","authors":"D. Pant, Dibyendu Talukder, D. Kumar, Rachit Pandey, Aaditeshwar Seth, Chetan Arora","doi":"10.1145/3530190.3534795","DOIUrl":null,"url":null,"abstract":"Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.","PeriodicalId":257424,"journal":{"name":"ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms\",\"authors\":\"D. Pant, Dibyendu Talukder, D. Kumar, Rachit Pandey, Aaditeshwar Seth, Chetan Arora\",\"doi\":\"10.1145/3530190.3534795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.\",\"PeriodicalId\":257424,\"journal\":{\"name\":\"ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3530190.3534795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530190.3534795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

发展方案的启动、监测和评价可涉及项目活动的实地数据收集。然而,由于实地干部买不起智能手机和平板电脑,或者他们的培训和能力建设不足等原因,通过数字设备收集数据可能并不总是可行的。基于纸张的数据收集被认为在一些情况下更合适,通过OCR(光学字符识别)和OMR(光学标记识别)技术实现纸张表格的自动化数字化。我们提供了大量的手写数字数据集,以及使用这些数据构建的基于深度学习的模型和方法,这些模型和方法在现实环境中是有效的。我们在一个妇幼健康和营养意识项目的背景下展示了这些工具的部署,该项目使用IVR(交互式语音应答)系统向印度北部农村妇女自助小组成员提供意识信息。我们使用纸质表格来大规模收集SHG成员的电话号码,并使用我们开发的OCR工具将其数字化,用于推动近400万次电话。数据、模型和代码已经在开源领域发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Use of Metric Learning for the Recognition of Handwritten Digits, and its Application to Increase the Outreach of Voice-based Communication Platforms
Initiation, monitoring, and evaluation of development programmes can involve field-based data collection about project activities. This data collection through digital devices may not always be feasible though, for reasons such as unaffordability of smartphones and tablets by field-based cadre, or shortfalls in their training and capacity building. Paper-based data collection has been argued to be more appropriate in several contexts, with automated digitization of the paper forms through OCR (Optical Character Recognition) and OMR (Optical Mark Recognition) techniques. We contribute with providing a large dataset of handwritten digits, and deep learning based models and methods built using this data, that are effective in real-world environments. We demonstrate the deployment of these tools in the context of a maternal and child health and nutrition awareness project, which uses IVR (Interactive Voice Response) systems to provide awareness information to rural women SHG (Self Help Group) members in north India. Paper forms were used to collect phone numbers of the SHG members at scale, which were digitized using the OCR tools developed by us, and used to push almost 4 million phone calls. The data, model, and code have been released in the open-source domain.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
NOTE: Unavoidable Service to Unnoticeable Risks: A Study on How Healthcare Record Management Opens the Doors of Unnoticeable Vulnerabilities for Rohingya Refugees Making AI Explainable in the Global South: A Systematic Review Note: A Sociomaterial Perspective on Trace Data Collection: Strategies for Democratizing and Limiting Bias Complexity of Factor Analysis for Particulate Matter (PM) Data: A Measurement Based Case Study in Delhi-NCR Note: Urbanization and Literacy as factors in Politicians’ Social Media Use in a largely Rural State: Evidence from Uttar Pradesh, India
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1