{"title":"学习语义丰富的基于网络的多模态移动用户界面嵌入","authors":"Gary Ang, Ee-Peng Lim","doi":"https://dl.acm.org/doi/10.1145/3533856","DOIUrl":null,"url":null,"abstract":"<p>Semantically rich information from multiple modalities—text, code, images, categorical and numerical data—co-exist in the user interface (UI) design of mobile applications. Moreover, each UI design is composed of inter-linked UI entities that support different functions of an application, e.g., a UI screen comprising a UI taskbar, a menu, and multiple button elements. Existing UI representation learning methods unfortunately are not designed to capture multi-modal and linkage structure between UI entities. To support effective search and recommendation applications over mobile UIs, we need UI representations that integrate latent semantics present in both multi-modal information and linkages between UI entities. In this article, we present a novel self-supervised model—Multi-modal Attention-based Attributed Network Embedding (MAAN) model. MAAN is designed to capture structural network information present within the linkages between UI entities, as well as multi-modal attributes of the UI entity nodes. Based on the variational autoencoder framework, MAAN learns semantically rich UI embeddings in a self-supervised manner by reconstructing the attributes of UI entities and the linkages between them. The generated embeddings can be applied to a variety of downstream tasks: predicting UI elements associated with UI screens, inferring missing UI screen and element attributes, predicting UI user ratings, and retrieving UIs. Extensive experiments, including user evaluations, conducted on datasets from RICO, a rich real-world mobile UI repository, demonstrate that MAAN out-performs other state-of-the-art models. The number of linkages between UI entities can provide further information on the role of different UI entities in UI designs. However, MAAN does not capture edge attributes. To extend and generalize MAAN to learn even richer UI embeddings, we further propose EMAAN to capture edge attributes. We conduct additional extensive experiments on EMAAN, which show that it improves the performance of MAAN and similarly out-performs state-of-the-art models.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"185 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Semantically Rich Network-based Multi-modal Mobile User Interface Embeddings\",\"authors\":\"Gary Ang, Ee-Peng Lim\",\"doi\":\"https://dl.acm.org/doi/10.1145/3533856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Semantically rich information from multiple modalities—text, code, images, categorical and numerical data—co-exist in the user interface (UI) design of mobile applications. Moreover, each UI design is composed of inter-linked UI entities that support different functions of an application, e.g., a UI screen comprising a UI taskbar, a menu, and multiple button elements. Existing UI representation learning methods unfortunately are not designed to capture multi-modal and linkage structure between UI entities. To support effective search and recommendation applications over mobile UIs, we need UI representations that integrate latent semantics present in both multi-modal information and linkages between UI entities. In this article, we present a novel self-supervised model—Multi-modal Attention-based Attributed Network Embedding (MAAN) model. MAAN is designed to capture structural network information present within the linkages between UI entities, as well as multi-modal attributes of the UI entity nodes. Based on the variational autoencoder framework, MAAN learns semantically rich UI embeddings in a self-supervised manner by reconstructing the attributes of UI entities and the linkages between them. The generated embeddings can be applied to a variety of downstream tasks: predicting UI elements associated with UI screens, inferring missing UI screen and element attributes, predicting UI user ratings, and retrieving UIs. Extensive experiments, including user evaluations, conducted on datasets from RICO, a rich real-world mobile UI repository, demonstrate that MAAN out-performs other state-of-the-art models. The number of linkages between UI entities can provide further information on the role of different UI entities in UI designs. However, MAAN does not capture edge attributes. To extend and generalize MAAN to learn even richer UI embeddings, we further propose EMAAN to capture edge attributes. We conduct additional extensive experiments on EMAAN, which show that it improves the performance of MAAN and similarly out-performs state-of-the-art models.</p>\",\"PeriodicalId\":48574,\"journal\":{\"name\":\"ACM Transactions on Interactive Intelligent Systems\",\"volume\":\"185 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Interactive Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3533856\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Interactive Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3533856","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Learning Semantically Rich Network-based Multi-modal Mobile User Interface Embeddings
Semantically rich information from multiple modalities—text, code, images, categorical and numerical data—co-exist in the user interface (UI) design of mobile applications. Moreover, each UI design is composed of inter-linked UI entities that support different functions of an application, e.g., a UI screen comprising a UI taskbar, a menu, and multiple button elements. Existing UI representation learning methods unfortunately are not designed to capture multi-modal and linkage structure between UI entities. To support effective search and recommendation applications over mobile UIs, we need UI representations that integrate latent semantics present in both multi-modal information and linkages between UI entities. In this article, we present a novel self-supervised model—Multi-modal Attention-based Attributed Network Embedding (MAAN) model. MAAN is designed to capture structural network information present within the linkages between UI entities, as well as multi-modal attributes of the UI entity nodes. Based on the variational autoencoder framework, MAAN learns semantically rich UI embeddings in a self-supervised manner by reconstructing the attributes of UI entities and the linkages between them. The generated embeddings can be applied to a variety of downstream tasks: predicting UI elements associated with UI screens, inferring missing UI screen and element attributes, predicting UI user ratings, and retrieving UIs. Extensive experiments, including user evaluations, conducted on datasets from RICO, a rich real-world mobile UI repository, demonstrate that MAAN out-performs other state-of-the-art models. The number of linkages between UI entities can provide further information on the role of different UI entities in UI designs. However, MAAN does not capture edge attributes. To extend and generalize MAAN to learn even richer UI embeddings, we further propose EMAAN to capture edge attributes. We conduct additional extensive experiments on EMAAN, which show that it improves the performance of MAAN and similarly out-performs state-of-the-art models.
期刊介绍:
The ACM Transactions on Interactive Intelligent Systems (TiiS) publishes papers on research concerning the design, realization, or evaluation of interactive systems that incorporate some form of machine intelligence. TIIS articles come from a wide range of research areas and communities. An article can take any of several complementary views of interactive intelligent systems, focusing on:
the intelligent technology,
the interaction of users with the system, or
both aspects at once.