Tongtong Xie, Haiying Ye, Hongyan Wang, J. V. D. Weijer
The tongue twister paradigm was used to compare numbers and types of errors of native and non-native speakers of Chinese when producing tongue twisters. The stimuli consisted of 106 quadruples, 32 of which were transliterated from English tongue twisters, 26 of which were vocalic twisters, and 48 of which were consonant twisters. Both consonant and vowel errors were investigated (but not tone) and errors were classified as caused by either preceding or following linguistic forms (or as caused by both or neither). To enhance errors, we requested participants to use a speech rate that was 20% faster than the normal rate. Four native Mandarin Chinese speakers and six foreign learners of Chinese read the tongue twisters aloud, repeating each one four times in a slide. The four native Mandarin Chinese speakers made a total of 606 errors, and the non-native speakers produced 3970 errors. The results show a clear difference between L1 and L2 speakers and a relation between years of learning Chinese and total number of errors.
{"title":"The Study of Phonological Neighborhoods in Chinese L1 and L2 Speech Production","authors":"Tongtong Xie, Haiying Ye, Hongyan Wang, J. V. D. Weijer","doi":"10.1145/3446132.3446135","DOIUrl":"https://doi.org/10.1145/3446132.3446135","url":null,"abstract":"The tongue twister paradigm was used to compare numbers and types of errors of native and non-native speakers of Chinese when producing tongue twisters. The stimuli consisted of 106 quadruples, 32 of which were transliterated from English tongue twisters, 26 of which were vocalic twisters, and 48 of which were consonant twisters. Both consonant and vowel errors were investigated (but not tone) and errors were classified as caused by either preceding or following linguistic forms (or as caused by both or neither). To enhance errors, we requested participants to use a speech rate that was 20% faster than the normal rate. Four native Mandarin Chinese speakers and six foreign learners of Chinese read the tongue twisters aloud, repeating each one four times in a slide. The four native Mandarin Chinese speakers made a total of 606 errors, and the non-native speakers produced 3970 errors. The results show a clear difference between L1 and L2 speakers and a relation between years of learning Chinese and total number of errors.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is mainly aimed at proposing a powerful feature fusion method for object detection. An exceptionally significant accuracy improvement is achieved by augmenting all multi-scale features by adding a finite amount of computation. Hence, we created our detector based on a fast detector on SSD [1] and called it Full Feature Fusion Network (F3N). Using several Feature Fusion modules, we fused low-level and high-level features by parallel low-high level sub-network with repeated information exchange across multi-scale features. We fused all the multi-scale features using concatenate and interpolate methods within several feature fusion modules. F3N achieves the new state of the art result for one-stage object detection. F3N with 512x512 input achieves 82.5% mAP (mean Average Precision) and 320x320 input yields 80.3% on the VOC2007 test, with 512x512 input achieving 81.1% and 320x320 input yielding 77.3% on the VOC2012 test. In MS COCO data set, 512x512 input obtains 33.9% and 320x320 input yields 30.4%. The accuracies are significantly enhanced compared to the current mainstream approaches such as SSD [1], DSSD [8], FPN [11], YOLO [6].
{"title":"F3N: Full Feature Fusion Network for Object Detection","authors":"Gang Wang, Tang Kai, Kazushige Ouchi","doi":"10.1145/3446132.3446152","DOIUrl":"https://doi.org/10.1145/3446132.3446152","url":null,"abstract":"This paper is mainly aimed at proposing a powerful feature fusion method for object detection. An exceptionally significant accuracy improvement is achieved by augmenting all multi-scale features by adding a finite amount of computation. Hence, we created our detector based on a fast detector on SSD [1] and called it Full Feature Fusion Network (F3N). Using several Feature Fusion modules, we fused low-level and high-level features by parallel low-high level sub-network with repeated information exchange across multi-scale features. We fused all the multi-scale features using concatenate and interpolate methods within several feature fusion modules. F3N achieves the new state of the art result for one-stage object detection. F3N with 512x512 input achieves 82.5% mAP (mean Average Precision) and 320x320 input yields 80.3% on the VOC2007 test, with 512x512 input achieving 81.1% and 320x320 input yielding 77.3% on the VOC2012 test. In MS COCO data set, 512x512 input obtains 33.9% and 320x320 input yields 30.4%. The accuracies are significantly enhanced compared to the current mainstream approaches such as SSD [1], DSSD [8], FPN [11], YOLO [6].","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126525930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enormous amount of news articles are being generated through different news agencies. The variation in journalistic content and online availability of news content, makes it difficult to monitor and interpret in real time. Organizing news articles would play a crucial role in its consumption and interpretation. Our work assists end user by grouping news articles based on the story. We present here a novel approach of grouping news articles based on a multi-level embedding representation of articles, coupled with a standard TF-IDF score based on named entities. Our results shows that combining the syntactic(TF-IDF) as well as the semantic (Bert) representations can boost the performance of the news grouping task. We also experiment with transfer learning and fine tuning of state-of-the-art BERT models for the task of document similarity and use the output embeddings as document representations.
{"title":"Grouping news events using semantic representations of hierarchical elements of articles and named entities","authors":"Abhishek Desai, Prateek Nagwanshi","doi":"10.1145/3446132.3446399","DOIUrl":"https://doi.org/10.1145/3446132.3446399","url":null,"abstract":"Enormous amount of news articles are being generated through different news agencies. The variation in journalistic content and online availability of news content, makes it difficult to monitor and interpret in real time. Organizing news articles would play a crucial role in its consumption and interpretation. Our work assists end user by grouping news articles based on the story. We present here a novel approach of grouping news articles based on a multi-level embedding representation of articles, coupled with a standard TF-IDF score based on named entities. Our results shows that combining the syntactic(TF-IDF) as well as the semantic (Bert) representations can boost the performance of the news grouping task. We also experiment with transfer learning and fine tuning of state-of-the-art BERT models for the task of document similarity and use the output embeddings as document representations.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131363726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To compare various techniques, the same platform is generally used into which the user will import a text dataset. Another approach uses an evaluation based on a gold standard for a specific task, but a balanced common language corpus is not often used. We choose the Corpus of Contemporary American English Corpus (COCA) as a balanced reference corpus, and split this corpus into categories, such as topics and genres, to apply families of feature extraction and machine learning algorithms. We found that the Stanford CoreNLP method was faster and more accurate than the NLTK method, and was more reliable and easier to understand. The results of clustering show that a higher modularity influences interpretation. For genre and topic classification, all techniques achieved a relatively high score, though these were below the state-of-the-art scores from challenge text datasets. Naïve Bayes outperformed the other alternatives. We hope that balanced corpora from a variety of different vernacular (or low-resource) languages can be used as references to determine the efficiency of the wide diversity of state-of-the-art text mining tools.
{"title":"Exploration of a Balanced Reference Corpus with a Wide Variety of Text Mining Tools","authors":"Nicolas Turenne, Bokai Xu, Xinyue Li, Xindi Xu, Hongyu Liu, Xiaolin Zhu","doi":"10.1145/3446132.3446192","DOIUrl":"https://doi.org/10.1145/3446132.3446192","url":null,"abstract":"To compare various techniques, the same platform is generally used into which the user will import a text dataset. Another approach uses an evaluation based on a gold standard for a specific task, but a balanced common language corpus is not often used. We choose the Corpus of Contemporary American English Corpus (COCA) as a balanced reference corpus, and split this corpus into categories, such as topics and genres, to apply families of feature extraction and machine learning algorithms. We found that the Stanford CoreNLP method was faster and more accurate than the NLTK method, and was more reliable and easier to understand. The results of clustering show that a higher modularity influences interpretation. For genre and topic classification, all techniques achieved a relatively high score, though these were below the state-of-the-art scores from challenge text datasets. Naïve Bayes outperformed the other alternatives. We hope that balanced corpora from a variety of different vernacular (or low-resource) languages can be used as references to determine the efficiency of the wide diversity of state-of-the-art text mining tools.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131947807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-turn dialogue system plays an important role in intelligent interaction. In particular, the subtask response generation in a multi- turn conversation system is a challenging task, which aims to generate more diverse and contextually relevant responses. Most of the methods focus on the sequential connection between sentence levels by using hierarchical framework and attention mechanism, but lack reflection from the overall semantic level such as topical information. Previous work would lead to a lack of full understanding of the dialogue history. In this paper, we propose a context-augmented model, named TGMA-RG, which leverages the conversational context to promote interactivity and persistence of multi-turn dialogues through topic-guided multi-head attention mechanism. Especially, we extract the topics from conversational context and design a hierarchical encoder-decoder models with a multi-head attention mechanism. Among them, we utilize topics vectors as queries of attention mechanism to obtain the corresponding weights between each utterance and each topic. Our experimental results on two publicly available datasets show that TGMA-RG improves the performance than other baselines in terms of BLEU-1, BLEU-2, Distinct-1, Distinct-2 and PPL.
{"title":"Leveraging Different Context for Response Generation through Topic-guided Multi-head Attention","authors":"Weikang Zhang, Zhanzhe Li, Yupu Guo","doi":"10.1145/3446132.3446168","DOIUrl":"https://doi.org/10.1145/3446132.3446168","url":null,"abstract":"Multi-turn dialogue system plays an important role in intelligent interaction. In particular, the subtask response generation in a multi- turn conversation system is a challenging task, which aims to generate more diverse and contextually relevant responses. Most of the methods focus on the sequential connection between sentence levels by using hierarchical framework and attention mechanism, but lack reflection from the overall semantic level such as topical information. Previous work would lead to a lack of full understanding of the dialogue history. In this paper, we propose a context-augmented model, named TGMA-RG, which leverages the conversational context to promote interactivity and persistence of multi-turn dialogues through topic-guided multi-head attention mechanism. Especially, we extract the topics from conversational context and design a hierarchical encoder-decoder models with a multi-head attention mechanism. Among them, we utilize topics vectors as queries of attention mechanism to obtain the corresponding weights between each utterance and each topic. Our experimental results on two publicly available datasets show that TGMA-RG improves the performance than other baselines in terms of BLEU-1, BLEU-2, Distinct-1, Distinct-2 and PPL.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114201513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional testing service incurs high cost and low efficiency because of the expenditure on testing tools and the geographical location. Cloud testing service platform (CTSP) uses cloud infrastructure for testing service, which leads to a more cost-effective testing solution. However, how to realize the intelligent matching among the various testing services and the testing demands is one of the common issues and aims for CTSP. This paper investigates a semantic demand-service matching method for CTSP. Considering the diverse, heterogeneous and dynamic characteristics of cloud testing services, an Input, Output, Precondition, Effect (IOPE) matching model based on Web Ontology Language for Service (OWL-S) is proposed, and a three-phase matching process is developed consisting of parameter matching, attribute matching and global matching. To compute the matching degree between a testing service and a testing demand during the matching process, a quantitative matching method is put forward. At last, the effectiveness and feasibility of the proposed method is tested by a case study.
传统的检测服务由于检测工具的费用和地理位置的限制,成本高,效率低。云测试服务平台(CTSP)使用云基础设施进行测试服务,从而提供更具成本效益的测试解决方案。然而,如何实现各种测试服务与测试需求之间的智能匹配是CTSP面临的共同问题和目标之一。研究了一种面向CTSP的语义需求服务匹配方法。针对云测试服务的多样性、异构性和动态性特点,提出了一种基于Web Ontology Language for Service (OWL-S)的输入、输出、前提、效果(IOPE)匹配模型,并构建了参数匹配、属性匹配和全局匹配的三阶段匹配流程。为了在匹配过程中计算测试服务与测试需求之间的匹配程度,提出了一种定量匹配方法。最后,通过实例验证了该方法的有效性和可行性。
{"title":"A Semantic Demand-Service Matching Method based on OWL-S for Cloud Testing Service Platform","authors":"Qing Xia, Chun-Xu Jiang, Chuan Yang, Hao Huang","doi":"10.1145/3446132.3446136","DOIUrl":"https://doi.org/10.1145/3446132.3446136","url":null,"abstract":"Traditional testing service incurs high cost and low efficiency because of the expenditure on testing tools and the geographical location. Cloud testing service platform (CTSP) uses cloud infrastructure for testing service, which leads to a more cost-effective testing solution. However, how to realize the intelligent matching among the various testing services and the testing demands is one of the common issues and aims for CTSP. This paper investigates a semantic demand-service matching method for CTSP. Considering the diverse, heterogeneous and dynamic characteristics of cloud testing services, an Input, Output, Precondition, Effect (IOPE) matching model based on Web Ontology Language for Service (OWL-S) is proposed, and a three-phase matching process is developed consisting of parameter matching, attribute matching and global matching. To compute the matching degree between a testing service and a testing demand during the matching process, a quantitative matching method is put forward. At last, the effectiveness and feasibility of the proposed method is tested by a case study.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127894963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of deep neural networks in pattern classification for recognizing handwritten digits on cheques, object classification for the automated surveillance, and autonomous vehicles, the problem of DNNs confront malicious inputs has been a hot topic. In this paper, we introduced a security-enhanced framework for DNNs to conduct classification based on moving target defense (MTDNNF). Also, we presented three pivotal characteristics to realize the framework, heterogeneity, selectivity, and adaptability, which enabled MTDNNF and guaranteed security and veracity. Also, we analyzed the security and performance of MTDNNF. Those analyses show that the MTDNNF can provide significant security improvements against malicious inputs, and extra cost in performance is inessential under both massive and minimum scenarios.
{"title":"MTDNNF: Building the Security Framework for Deep Neural Network by Moving Target Defense*","authors":"Weiwei Wang, Xinli Xiong, Songhe Wang, Jingye Zhang","doi":"10.1145/3446132.3446178","DOIUrl":"https://doi.org/10.1145/3446132.3446178","url":null,"abstract":"With the development of deep neural networks in pattern classification for recognizing handwritten digits on cheques, object classification for the automated surveillance, and autonomous vehicles, the problem of DNNs confront malicious inputs has been a hot topic. In this paper, we introduced a security-enhanced framework for DNNs to conduct classification based on moving target defense (MTDNNF). Also, we presented three pivotal characteristics to realize the framework, heterogeneity, selectivity, and adaptability, which enabled MTDNNF and guaranteed security and veracity. Also, we analyzed the security and performance of MTDNNF. Those analyses show that the MTDNNF can provide significant security improvements against malicious inputs, and extra cost in performance is inessential under both massive and minimum scenarios.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127007974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate customer classification can help company save costs and create profits more effectively. In previous studies, few research uses spatio-temporal data for customer classification. In this paper, we put forward a hybrid classification method named MDF based on transition probability matrix andDeep Forest in order to improve the performance in customer classification. The innovation of the proposed new method is that it converts spatio-temporal data to construct the transition probability matrix and then it adopts Deep Forest to classify customers into different types. Experiments on real-world customer classification task from retail company have been done and we have compared MDF with some benchmark methods. Experimental results shows that the proposed method MDF have better performance than other techniques. The new customer classification method provides useful a tool for customer relationship management.
{"title":"Customer classification based on spatial transition probability and Deep Forest","authors":"Yanbing Liu, Xiang Shi, Feijie Huang, Senyou Yang, Qiqi Fan, B. Zhu","doi":"10.1145/3446132.3446171","DOIUrl":"https://doi.org/10.1145/3446132.3446171","url":null,"abstract":"Accurate customer classification can help company save costs and create profits more effectively. In previous studies, few research uses spatio-temporal data for customer classification. In this paper, we put forward a hybrid classification method named MDF based on transition probability matrix andDeep Forest in order to improve the performance in customer classification. The innovation of the proposed new method is that it converts spatio-temporal data to construct the transition probability matrix and then it adopts Deep Forest to classify customers into different types. Experiments on real-world customer classification task from retail company have been done and we have compared MDF with some benchmark methods. Experimental results shows that the proposed method MDF have better performance than other techniques. The new customer classification method provides useful a tool for customer relationship management.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116387786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A hierarchical braking control strategy based on driver braking acceleration analysis is proposed in this paper. Firstly the vehicle state data of the driver during braking collected by driving simulator. Then by analyzing the acceleration of the vehicle during braking, the driver's desired acceleration during collision avoidance is determined, and the TTC threshold is divided according to this desired acceleration value. The two-level warning and two-level braking collision avoidance strategies are designed based on the second-order collision time model. Finally, the overall simulation model of the collision warning system is constructed in Simulink/ Carsim. The co-simulation test results demonstrate that the hierarchical braking and warning strategy of this system can effectively avoid crash.
{"title":"Longitudinal collision warning system based on driver braking characteristics","authors":"Zhifeng Han, Xu Li, Jianchun Wang","doi":"10.1145/3446132.3446141","DOIUrl":"https://doi.org/10.1145/3446132.3446141","url":null,"abstract":"A hierarchical braking control strategy based on driver braking acceleration analysis is proposed in this paper. Firstly the vehicle state data of the driver during braking collected by driving simulator. Then by analyzing the acceleration of the vehicle during braking, the driver's desired acceleration during collision avoidance is determined, and the TTC threshold is divided according to this desired acceleration value. The two-level warning and two-level braking collision avoidance strategies are designed based on the second-order collision time model. Finally, the overall simulation model of the collision warning system is constructed in Simulink/ Carsim. The co-simulation test results demonstrate that the hierarchical braking and warning strategy of this system can effectively avoid crash.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116793250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 3D point clouds, superpoint is a set of points that share common characteristics. Semantically pure superpoints can greatly reduce the number of points while ensuring that the points located in the same superpoint have common semantic information. In this paper, we propose an end-to-end method for generating semantically pure superpoints. Specifically, we first use a light PointNet-liked network to embed low-dimensional point clouds into feature space to obtain semantic information. Next, we use farthest point sampling (FPS) to sample K points as the initial cluster centers. For each center, we cluster the points by jointly considering spatial and feature space. After clustering, we update the feature of each cluster center by simply averaging the point feature in the same cluster. By iteratively clustering and updating the feature of clusters, we obtain coarse superpoints, which contain a few points incorrectly clustered. Finally, to eliminate incorrectly clustered points, we leverage the breadth-first-search (BFS) to find and fuse them to obtain fine superpoints, leading to improvement on semantically pure superpoints. Extensive experiments conducted on S3DIS and ScanNet demonstrate the effectiveness of the proposed method. Furthermore, we achieve the state-of-the-art on both two datasets.
在三维点云中,超级点是一组具有共同特征的点。语义纯粹的超级点可以大大减少点的数量,同时确保位于同一超级点的点具有共同的语义信息。在本文中,我们提出了一种生成语义纯粹的超级点的端到端方法。具体来说,我们首先使用轻型点网(PointNet-liked)网络将低维点云嵌入特征空间,以获取语义信息。接下来,我们使用最远点采样(FPS)对 K 个点进行采样,作为初始聚类中心。对于每个中心,我们通过联合考虑空间和特征空间对点进行聚类。聚类后,我们通过简单地平均同一聚类中的点特征来更新每个聚类中心的特征。通过迭代聚类和更新聚类特征,我们会得到粗略的超级点,其中包含一些聚类错误的点。最后,为了消除错误聚类的点,我们利用广度优先搜索(BFS)来查找并融合这些点,从而获得精细超级点,从而改进语义纯粹的超级点。在 S3DIS 和 ScanNet 上进行的大量实验证明了所提方法的有效性。此外,我们在这两个数据集上都达到了最先进的水平。
{"title":"Deep Learning on Superpoint Generation with Iterative Clustering Network","authors":"Jianlong Yuan, Jin Xie","doi":"10.1145/3446132.3446139","DOIUrl":"https://doi.org/10.1145/3446132.3446139","url":null,"abstract":"In 3D point clouds, superpoint is a set of points that share common characteristics. Semantically pure superpoints can greatly reduce the number of points while ensuring that the points located in the same superpoint have common semantic information. In this paper, we propose an end-to-end method for generating semantically pure superpoints. Specifically, we first use a light PointNet-liked network to embed low-dimensional point clouds into feature space to obtain semantic information. Next, we use farthest point sampling (FPS) to sample K points as the initial cluster centers. For each center, we cluster the points by jointly considering spatial and feature space. After clustering, we update the feature of each cluster center by simply averaging the point feature in the same cluster. By iteratively clustering and updating the feature of clusters, we obtain coarse superpoints, which contain a few points incorrectly clustered. Finally, to eliminate incorrectly clustered points, we leverage the breadth-first-search (BFS) to find and fuse them to obtain fine superpoints, leading to improvement on semantically pure superpoints. Extensive experiments conducted on S3DIS and ScanNet demonstrate the effectiveness of the proposed method. Furthermore, we achieve the state-of-the-art on both two datasets.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129417447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}