WebEvo: taming web application evolution via detecting semantic structure changes

Fei Shao, Ruiwen Xu, W. Haque, Jingwei Xu, Ying Zhang, Wei Yang, Yanfang Ye, Xusheng Xiao
{"title":"WebEvo: taming web application evolution via detecting semantic structure changes","authors":"Fei Shao, Ruiwen Xu, W. Haque, Jingwei Xu, Ying Zhang, Wei Yang, Yanfang Ye, Xusheng Xiao","doi":"10.1145/3460319.3464800","DOIUrl":null,"url":null,"abstract":"The development of Web technology and the beginning of the Big Data era have led to the development of technologies for extracting data from websites, such as information retrieval (IR) and robotic process automation (RPA) tools. As websites are constantly evolving, to prevent these tools from functioning improperly due to website evolution, it is important to monitor the changes in websites and report them to the developers and testers. Existing monitoring tools mainly use DOM-tree based techniques to detect changes in the new web pages. However, these monitoring tools incorrectly report content-based changes (i.e., web content refreshed every time a web page is retrieved) as the changes that will adversely affect the performance of the IR and RPA tools. This results in false warnings since the IR and RPA tools typically consider these changes as expected and retrieve dynamic data from them. Moreover, these monitoring tools cannot identify GUI widget evolution (e.g., moving a button), and thus cannot help the IR and RPA tools adapt to the evolved widgets (e.g., automatic repair of locators for the evolved widgets). To address the limitations of the existing monitoring tools, we propose an approach, WebEvo, that leverages historic pages to identify the DOM elements whose changes are content-based changes, which can be safely ignored when reporting changes in the new web pages. Furthermore, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. Empirical evaluations on 13 real-world websites from 9 popular categories demonstrate the superiority of WebEvo over the existing DOM-tree based detection or whole-page visual comparison in terms of both effectiveness and efficiency.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The development of Web technology and the beginning of the Big Data era have led to the development of technologies for extracting data from websites, such as information retrieval (IR) and robotic process automation (RPA) tools. As websites are constantly evolving, to prevent these tools from functioning improperly due to website evolution, it is important to monitor the changes in websites and report them to the developers and testers. Existing monitoring tools mainly use DOM-tree based techniques to detect changes in the new web pages. However, these monitoring tools incorrectly report content-based changes (i.e., web content refreshed every time a web page is retrieved) as the changes that will adversely affect the performance of the IR and RPA tools. This results in false warnings since the IR and RPA tools typically consider these changes as expected and retrieve dynamic data from them. Moreover, these monitoring tools cannot identify GUI widget evolution (e.g., moving a button), and thus cannot help the IR and RPA tools adapt to the evolved widgets (e.g., automatic repair of locators for the evolved widgets). To address the limitations of the existing monitoring tools, we propose an approach, WebEvo, that leverages historic pages to identify the DOM elements whose changes are content-based changes, which can be safely ignored when reporting changes in the new web pages. Furthermore, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. Empirical evaluations on 13 real-world websites from 9 popular categories demonstrate the superiority of WebEvo over the existing DOM-tree based detection or whole-page visual comparison in terms of both effectiveness and efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
WebEvo:通过检测语义结构变化来驯服web应用程序的演变
Web技术的发展和大数据时代的开始,导致了从网站中提取数据的技术的发展,如信息检索(IR)和机器人过程自动化(RPA)工具。随着网站的不断发展,为了防止这些工具由于网站的发展而导致功能不正常,监控网站的变化并向开发人员和测试人员报告是很重要的。现有的监控工具主要使用基于dom树的技术来检测新网页的变化。然而,这些监控工具错误地将基于内容的更改(即,每次检索网页时刷新的web内容)报告为会对IR和RPA工具的性能产生不利影响的更改。这将导致错误的警告,因为IR和RPA工具通常会将这些更改视为预期的,并从中检索动态数据。此外,这些监视工具不能识别GUI小部件的演变(例如,移动按钮),因此不能帮助IR和RPA工具适应演变的小部件(例如,为演变的小部件自动修复定位器)。为了解决现有监控工具的局限性,我们提出了一种方法,WebEvo,它利用历史页面来识别DOM元素,这些元素的变化是基于内容的变化,在报告新网页的变化时可以安全地忽略它们。此外,为了识别那些保留语义和GUI小部件外观的重构变化,WebEvo采用了计算机视觉(CV)技术,以逐个元素的基础来识别GUI小部件从旧网页到新网页的映射。对来自9个流行类别的13个真实网站的实证评估表明,WebEvo在有效性和效率方面优于现有的基于dom树的检测或整个页面的视觉比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semantic table structure identification in spreadsheets Parema: an unpacking framework for demystifying VM-based Android packers TERA: optimizing stochastic regression tests in machine learning projects Empirically evaluating readily available information for regression test optimization in continuous integration RESTest: automated black-box testing of RESTful web APIs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1