Robust OCR Pipeline for Automated Digitization of Mother and Child Protection Cards in India

D. Pant, Dibyendu Talukder, Aaditeshwar Seth, Dinesh Pant, Rohit Singh, Brejesh Dua, Rachit Pandey, Srirama Maruthi, M. Johri, Chetan Arora
{"title":"Robust OCR Pipeline for Automated Digitization of Mother and Child Protection Cards in India","authors":"D. Pant, Dibyendu Talukder, Aaditeshwar Seth, Dinesh Pant, Rohit Singh, Brejesh Dua, Rachit Pandey, Srirama Maruthi, M. Johri, Chetan Arora","doi":"10.1145/3608114","DOIUrl":null,"url":null,"abstract":"The Universal Immunization Programme in India has a mandate to fully vaccinate all of India’s 27 million children born annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and significant time and resources. In our article, we focus on Optical Character Recognition– (OCR) based automated digitization of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for the MCP cards, which is available a priori, we register the card images and perform OCR on the extracted region of interest (ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global–local alignment technique that first approximates the ROI using global homography and then refines using a local homography resulting in improved accuracy. Our pipeline gives a character level accuracy of 98.73% on our dataset against 75.02% by Google Cloud Vision and 79.26% by Azure OCR. We also describe our field testing experience, where the digitized MCP card images were used to provide useful features on the smartphone application for health workers to conduct vaccination sessions.","PeriodicalId":238057,"journal":{"name":"ACM Journal on Computing and Sustainable Societies","volume":"472 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Sustainable Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3608114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Universal Immunization Programme in India has a mandate to fully vaccinate all of India’s 27 million children born annually. The vaccination doses are recorded by frontline health workers on standardized paper-based Mother and Child Protection (MCP) cards, which are manually digitized by data entry operators, resulting in poor data quality, delays, and significant time and resources. In our article, we focus on Optical Character Recognition– (OCR) based automated digitization of MCP card images captured through a smartphone application developed by us. By utilizing a standardized template for the MCP cards, which is available a priori, we register the card images and perform OCR on the extracted region of interest (ROIs). Since the cards with curvature or torn edges had poor ROIs, we built a global–local alignment technique that first approximates the ROI using global homography and then refines using a local homography resulting in improved accuracy. Our pipeline gives a character level accuracy of 98.73% on our dataset against 75.02% by Google Cloud Vision and 79.26% by Azure OCR. We also describe our field testing experience, where the digitized MCP card images were used to provide useful features on the smartphone application for health workers to conduct vaccination sessions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
印度用于母婴保护卡自动化数字化的强大OCR管道
印度普遍免疫规划的任务是为印度每年出生的2700万儿童全部接种疫苗。疫苗接种剂量由一线卫生工作者记录在标准化纸质妇幼保护卡(MCP)上,这些卡由数据输入操作员手动数字化,导致数据质量差、延迟以及大量时间和资源。在我们的文章中,我们重点研究了通过我们开发的智能手机应用程序捕获的基于光学字符识别(OCR)的MCP卡图像的自动数字化。通过使用MCP卡的标准模板(先验可用),我们注册卡图像并对提取的感兴趣区域(roi)执行OCR。由于具有曲率或撕裂边缘的卡片具有较差的ROI,因此我们构建了一种全局-局部对齐技术,该技术首先使用全局单应性近似ROI,然后使用局部单应性进行改进,从而提高精度。我们的管道在我们的数据集上给出了98.73%的字符级准确率,而谷歌云视觉的准确率为75.02%,Azure OCR的准确率为79.26%。我们还描述了我们的现场测试经验,其中数字化MCP卡图像用于为卫生工作者提供智能手机应用程序的有用功能,以便进行疫苗接种。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
What's Up On The Roof: Tracking Cool Roofs in India with Satellite Imaging Examining Factors Influencing Technology Integration in Indian Classrooms: A Teachers' Perspective Experiences from Running a Voice-Based Education Platform for Children and Teachers with Visual Impairments Shadow Program Committee: Designing for Diversity and Equity within Academic Communities Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-Shot Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1