Pub Date : 2018-12-13DOI: 10.25073/2588-1086/VNUCSCE.211
C. Vo, T. Cao, Bao Ho
Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer-based data processing tasks. In this paper, we propose a solution to the abbreviation identification task on clinical notes in a practical context where a few clinical notes have been labeled while so many clinical notes need to be labeled. Our solution is defined with a semi-supervised learning approach that uses level-wise feature engineering to construct an abbreviation identifier, from using a small set of labeled clinical texts and exploiting a larger set of unlabeled clinical texts. A semi-supervised learning algorithm, Semi-RF, and its advanced adaptive version, Weighted Semi-RF, are proposed in the self-training framework using random forest models and Tri-training. Weighted Semi-RF is different from Semi-RF as equipped with a new weighting scheme via adaptation on the current labeled data set. The proposed semi-supervised learning algorithms are practical with parameter-free settings to build an effective abbreviation identifier for identifying abbreviations automatically in clinical texts. Their effectiveness is confirmed with the better Precision and F-measure values from various experiments on real Vietnamese clinical notes. Compared to the existing solutions, our solution is novel for automatic abbreviation identification in clinical notes. Its results can lay the basis for determining the full form of each correctly identified abbreviation and then enhance the readability of the records. Keywords: Electronic medical record, Clinical note, Abbreviation identification, Semi-supervised learning, Self-training, Random forest.
{"title":"Abbreviation Detection in Vietnamese Clinical Texts","authors":"C. Vo, T. Cao, Bao Ho","doi":"10.25073/2588-1086/VNUCSCE.211","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.211","url":null,"abstract":"Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer-based data processing tasks. In this paper, we propose a solution to the abbreviation identification task on clinical notes in a practical context where a few clinical notes have been labeled while so many clinical notes need to be labeled. Our solution is defined with a semi-supervised learning approach that uses level-wise feature engineering to construct an abbreviation identifier, from using a small set of labeled clinical texts and exploiting a larger set of unlabeled clinical texts. A semi-supervised learning algorithm, Semi-RF, and its advanced adaptive version, Weighted Semi-RF, are proposed in the self-training framework using random forest models and Tri-training. Weighted Semi-RF is different from Semi-RF as equipped with a new weighting scheme via adaptation on the current labeled data set. The proposed semi-supervised learning algorithms are practical with parameter-free settings to build an effective abbreviation identifier for identifying abbreviations automatically in clinical texts. Their effectiveness is confirmed with the better Precision and F-measure values from various experiments on real Vietnamese clinical notes. Compared to the existing solutions, our solution is novel for automatic abbreviation identification in clinical notes. Its results can lay the basis for determining the full form of each correctly identified abbreviation and then enhance the readability of the records. \u0000Keywords: Electronic medical record, Clinical note, Abbreviation identification, Semi-supervised learning, \u0000Self-training, Random forest.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126618446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-13DOI: 10.25073/2588-1086/VNUCSCE.201
Nguyen Thanh Nhan, Do Thanh Binh, Nguyen Hoang, Vu Hai, Tran Thi Thanh Hai, L. Lan
This paper describes some fusion techniques for achieving high accuracy species identification from images of different plant organs. Given a series of different image organs such as branch, entire, flower, or leaf, we firstly extract confidence scores for each single organ using a deep convolutional neural network. Then, various late fusion approaches including conventional transformation-based approaches (sum rule, max rule, product rule), a classification-based approach (support vector machine), and our proposed hybrid fusion model are deployed to determine the identity of the plant of interest. For single organ identification, two schemes are proposed. The first scheme uses one Convolutional neural network (CNN) for each organ while the second one trains one CNN for all organs. Two famous CNNs (AlexNet and Resnet) are chosen in this paper. We evaluate the performances of the proposed method in a large number of images of 50 species which are collected from two primary resources: PlantCLEF 2015 dataset and Internet resources. The experiment exhibits the dominant results of the fusion techniques compared with those of individual organs. At rank-1, the highest species identification accuracy of a single organ is 75.6% for flower images, whereas by applying fusion technique for leaf and flower, the accuracy reaches to 92.6%. We also compare the fusion strategies with the multi-column deep convolutional neural networks (MCDCNN) [1]. The proposed hybrid fusion scheme outperforms MCDCNN in all combinations. It obtains from + 3.0% to + 13.8% improvement in rank-1 over MCDCNN method. The evaluation datasets as well as the source codes are publicly available. Keywords: Plant identification, Convolutional neural network, Deep learning, Fusion.
{"title":"Score-based Fusion Schemes for Plant Identification from Multi-organ Images","authors":"Nguyen Thanh Nhan, Do Thanh Binh, Nguyen Hoang, Vu Hai, Tran Thi Thanh Hai, L. Lan","doi":"10.25073/2588-1086/VNUCSCE.201","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.201","url":null,"abstract":"This paper describes some fusion techniques for achieving high accuracy species identification from images of different plant organs. Given a series of different image organs such as branch, entire, flower, or leaf, we firstly extract confidence scores for each single organ using a deep convolutional neural network. Then, various late fusion approaches including conventional transformation-based approaches (sum rule, max rule, product rule), a classification-based approach (support vector machine), and our proposed hybrid fusion model are deployed to determine the identity of the plant of interest. For single organ identification, two schemes are proposed. The first scheme uses one Convolutional neural network (CNN) for each organ while the second one trains one CNN for all organs. Two famous CNNs (AlexNet and Resnet) are chosen in this paper. We evaluate the performances of the proposed method in a large number of images of 50 species which are collected from two primary resources: PlantCLEF 2015 dataset and Internet resources. The experiment exhibits the dominant results of the fusion techniques compared with those of individual organs. At rank-1, the highest species identification accuracy of a single organ is 75.6% for flower images, whereas by applying fusion technique for leaf and flower, the accuracy reaches to 92.6%. We also compare the fusion strategies with the multi-column deep convolutional neural networks (MCDCNN) [1]. The proposed hybrid fusion scheme outperforms MCDCNN in all combinations. It obtains from + 3.0% to + 13.8% improvement in rank-1 over MCDCNN method. The evaluation datasets as well as the source codes are publicly available. \u0000Keywords: Plant identification, Convolutional neural network, Deep learning, Fusion. \u0000 ","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-27DOI: 10.25073/2588-1086/VNUCSCE.199
Dominika Thiem, H. Khuong
Spectrum sharing environment creates cross-interference between licensed network and unlicensednetwork. Most existing works consider unlicensed interference (i.e., interference from unlicensed networkto licensed network) while ignoring licensed interference (i.e., interference from licensed networkto unlicensed network). Moreover, existing channel estimation algorithms cannot exactly estimate channelinformation. In this paper, impacts of licensed interference and inaccurate channel information oninformation security in the spectrum sharing environment is analyzed under peak transmit power bound,peak interference power bound, and Rayleigh fading. Toward this end, a secrecy outage probabilityformula is proposed in an exact form and validated by simulations. Various results illustrate that secrecyoutage probability is constant in a range of large peak interference powers or large peak transmit powers,and is severely affected by licensed interference and inaccurate channel information.
{"title":"Impacts of Licensed Interference and Inaccurate Channel Information on Information Security in Spectrum Sharing Environment","authors":"Dominika Thiem, H. Khuong","doi":"10.25073/2588-1086/VNUCSCE.199","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.199","url":null,"abstract":"Spectrum sharing environment creates cross-interference between licensed network and unlicensednetwork. Most existing works consider unlicensed interference (i.e., interference from unlicensed networkto licensed network) while ignoring licensed interference (i.e., interference from licensed networkto unlicensed network). Moreover, existing channel estimation algorithms cannot exactly estimate channelinformation. In this paper, impacts of licensed interference and inaccurate channel information oninformation security in the spectrum sharing environment is analyzed under peak transmit power bound,peak interference power bound, and Rayleigh fading. Toward this end, a secrecy outage probabilityformula is proposed in an exact form and validated by simulations. Various results illustrate that secrecyoutage probability is constant in a range of large peak interference powers or large peak transmit powers,and is severely affected by licensed interference and inaccurate channel information.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128028184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-27DOI: 10.25073/2588-1086/vnucsce.202
Dao Van Lan, Nguyen Anh Thai, Hoang Van Phuc
This paper presents a low area, low power AES-CCM authenticated encryption IP core with silicon demonstration in 180nm standard CMOS process. The proposed AES-CCM core combines a low area 8-bit single S-box AES encryption core, improved iterative structure and other optimized circuits. The implementation results show that the proposed AES-CCM core achieves very high resource efficiency with 6.5 kgates GE and the low power consumption of 11.6 µW/MHz while meeting the requirement of the operation speed for many applications including IEEE 802.15.6 WBANs. The detail implementation and optimization results are also presented and discussed.
{"title":"A Low Area, Low Power 8-bit AES-CCM Authenticated Encryption Core in 180nm CMOS Process","authors":"Dao Van Lan, Nguyen Anh Thai, Hoang Van Phuc","doi":"10.25073/2588-1086/vnucsce.202","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.202","url":null,"abstract":"This paper presents a low area, low power AES-CCM authenticated encryption IP core with silicon demonstration in 180nm standard CMOS process. The proposed AES-CCM core combines a low area 8-bit single S-box AES encryption core, improved iterative structure and other optimized circuits. The implementation results show that the proposed AES-CCM core achieves very high resource efficiency with 6.5 kgates GE and the low power consumption of 11.6 µW/MHz while meeting the requirement of the operation speed for many applications including IEEE 802.15.6 WBANs. The detail implementation and optimization results are also presented and discussed.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128577138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-27DOI: 10.25073/2588-1086/VNUCSCE.174
Vu Ngoc Cham, N. Anh
A federation is usually an alliance of organisations where users from one organisation are trusted to access resources in another organisation. The membership of federations is diverse and continually changing. Federations require distributed and dynamic security policy management to meet these challenges. We propose an authorisation policy management model, FABACD, which simplifies the management of collaborations between organisations. It allows distributed and trusted administrators to adjust the authorisation policies in a resource holding organisation, whilst ensuring that the latter remains in ultimate control. The net result is that a resource’s authorisation system is able to use user credentials built from preexisting attributes issued by any participating organisation, in order to determine a user’s access rights to the various resources, without requiring credentials to be issued that are based on federation specific attributes. The model significantly simplifies the authorisation management process for the resource holding organisation.
{"title":"An Authorisation Policy Management Model in Federations","authors":"Vu Ngoc Cham, N. Anh","doi":"10.25073/2588-1086/VNUCSCE.174","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.174","url":null,"abstract":"A federation is usually an alliance of organisations where users from one organisation are trusted to access resources in another organisation. The membership of federations is diverse and continually changing. Federations require distributed and dynamic security policy management to meet these challenges. We propose an authorisation policy management model, FABACD, which simplifies the management of collaborations between organisations. It allows distributed and trusted administrators to adjust the authorisation policies in a resource holding organisation, whilst ensuring that the latter remains in ultimate control. The net result is that a resource’s authorisation system is able to use user credentials built from preexisting attributes issued by any participating organisation, in order to determine a user’s access rights to the various resources, without requiring credentials to be issued that are based on federation specific attributes. The model significantly simplifies the authorisation management process for the resource holding organisation.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133985330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-09-27DOI: 10.25073/2588-1086/VNUCSCE.198
Le Dao Thi Hue, Luong Pham Van, D. Trieu, Xiem HoangVan
Video surveillance has been playing an important role in public safety and privacy protection in recent years thanks to its capability of providing the activity monitoring and content analyzing. However, the data associated with long hours surveillance video is huge, making it less attractive to practical applications. In this paper, we propose a low complexity, yet efficient scalable video coding solution for video surveillance system. The proposed surveillance video compression scheme is able to provide the quality scalability feature by following a layered coding structure that consists of one or several enhancement layers on the top of a base layer. In addition, to maintain the backward compatibility with the current video coding standards, the state-of-the-art video coding standard, i.e., High Efficiency Video Coding (HEVC), is employed in the proposed coding solution to compress the base layer. To satisfy the low complexity requirement of the encoder for the video surveillance systems, the distributed coding concept is employed at the enhancement layers. Experiments conducted for a rich set of surveillance video data shown that the proposed surveillance - distributed scalable video coding (S-DSVC) solution significantly outperforms relevant video coding benchmarks, notably the SHVC standard and the HEVC-simulcasting while requiring much lower computational complexity at the encoder which is essential for practical video surveillance applications.
近年来,视频监控以其提供活动监控和内容分析的能力,在公共安全和隐私保护方面发挥着重要作用。然而,与长时间监控视频相关的数据是巨大的,这使得它对实际应用的吸引力降低。本文针对视频监控系统提出了一种低复杂度、高效可扩展的视频编码方案。本文提出的监控视频压缩方案通过采用在基础层之上由一个或多个增强层组成的分层编码结构来提供高质量的可扩展性特征。此外,为了保持与当前视频编码标准的向后兼容性,本文提出的编码方案采用了最先进的视频编码标准HEVC (High Efficiency video coding)对基础层进行压缩。为了满足视频监控系统对编码器的低复杂度要求,在增强层采用了分布式编码的概念。针对丰富的监控视频数据进行的实验表明,所提出的监控分布式可扩展视频编码(S-DSVC)解决方案显著优于相关的视频编码基准,特别是SHVC标准和hevc -联播,同时编码器的计算复杂度大大降低,这对于实际视频监控应用至关重要。
{"title":"Efficient and Low Complexity Surveillance Video Compression using Distributed Scalable Video Coding","authors":"Le Dao Thi Hue, Luong Pham Van, D. Trieu, Xiem HoangVan","doi":"10.25073/2588-1086/VNUCSCE.198","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.198","url":null,"abstract":"Video surveillance has been playing an important role in public safety and privacy protection in recent years thanks to its capability of providing the activity monitoring and content analyzing. However, the data associated with long hours surveillance video is huge, making it less attractive to practical applications. In this paper, we propose a low complexity, yet efficient scalable video coding solution for video surveillance system. The proposed surveillance video compression scheme is able to provide the quality scalability feature by following a layered coding structure that consists of one or several enhancement layers on the top of a base layer. In addition, to maintain the backward compatibility with the current video coding standards, the state-of-the-art video coding standard, i.e., High Efficiency Video Coding (HEVC), is employed in the proposed coding solution to compress the base layer. To satisfy the low complexity requirement of the encoder for the video surveillance systems, the distributed coding concept is employed at the enhancement layers. Experiments conducted for a rich set of surveillance video data shown that the proposed surveillance - distributed scalable video coding (S-DSVC) solution significantly outperforms relevant video coding benchmarks, notably the SHVC standard and the HEVC-simulcasting while requiring much lower computational complexity at the encoder which is essential for practical video surveillance applications.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116949248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-04DOI: 10.25073/2588-1086/vnucsce.205
N. V. Hao, N. D. Minh, Pham Nguyen Thanh Loan
In this paper, an adaptive and wide-range output DC-DC converter designed for lithium-ion (Li-Ion) battery charger circuit is proposed. The converter operates in continuous conduction mode (CCM) to provide an output voltage in response to battery voltage and a wide-range output current to ensure that circuit requirements are met. This circuit is designed on Cadence using 0.35-um BCD technology. Simulation results show that the circuit fully operates in CCM mode with a load current from 50 mA to 1000 mA and output voltage ripple factor is less than 1 %. Furthermore, the current supplied to the load circuit responses to three types of Li-Ion rechargeable currents. The output voltage of the converter varies from 2.8 to 4.5 V corresponding to the voltage range of the battery being charged from 2.5 to 4.2 V. The average power efficiency of the converter in large load current mode (1000 mA) reaches 94 %.
{"title":"An Adaptive and Wide-Range Output DC-DC Converter for Loading Circuit of Li-Ion Battery Charger","authors":"N. V. Hao, N. D. Minh, Pham Nguyen Thanh Loan","doi":"10.25073/2588-1086/vnucsce.205","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.205","url":null,"abstract":"In this paper, an adaptive and wide-range output DC-DC converter designed for lithium-ion (Li-Ion) battery charger circuit is proposed. The converter operates in continuous conduction mode (CCM) to provide an output voltage in response to battery voltage and a wide-range output current to ensure that circuit requirements are met. This circuit is designed on Cadence using 0.35-um BCD technology. Simulation results show that the circuit fully operates in CCM mode with a load current from 50 mA to 1000 mA and output voltage ripple factor is less than 1 %. Furthermore, the current supplied to the load circuit responses to three types of Li-Ion rechargeable currents. The output voltage of the converter varies from 2.8 to 4.5 V corresponding to the voltage range of the battery being charged from 2.5 to 4.2 V. The average power efficiency of the converter in large load current mode (1000 mA) reaches 94 %.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132069761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-10DOI: 10.25073/2588-1086/VNUCSCE.194
Trần Ngọc Hà, Le Nhu Hien, H. X. Huan
One of the main tasks of structural biology is comparing the structure of proteins. Comparisons of protein structure can determine their functional similarities. Multigraph alignment is a useful tool for identifying functional similarities based on structural analysis. This article proposes a new algorithm for aligning protein binding sites called ACOTS-MGA. This algorithm is based on the memetic scheme. It uses the ACO method to construct a set of solutions, then selects the best solution for implementing Tabu Search to improve the solution quality. Experimental results have shown that ACOTS-MGA outperforms state-of-the-art algorithms while producing alignments of better quality.KeywordsMultiple Graph Alignment, Tabu Search, Ant Colony Optimization, local search, memetic algorithm, SMMAS pheromone update rule, protein active sitesReferencesE. Todd, C. A. Orengo, and J. M. Thornton, “Evolution of function in protein superfamilies, from a structural perspective,” J. Mol. Biol., vol. 307, no. 4, pp. 1113–1143, Apr. 2001.S. F. Altschul et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., vol. 25, pp. 3389–3402, 1997.R. C. Edgar, “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res., vol. 32, no. 5, pp. 1792–1797, Mar. 2004.J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Res., vol. 22, no. 22, pp. 4673–4680, Nov. 1994.M. Larkin, G. Blackshields, N. Brown, … R. C.-, and undefined 2007, “Clustal W and Clustal X version 2.0,” academic.oup.com.C. Notredame, D. G. Higgins, and J. Heringa, “T-coffee: a novel method for fast and accurate multiple sequence alignment,” J. Mol. Biol., vol. 302, no. 1, pp. 205–217, Sep. 2000.K. Sjolander, “Phylogenomic inference of protein molecular function: advances and challenges,” Bioinformatics, vol. 20, no. 2, pp. 170–179, Jan. 2004.T. Fober, M. Mernberger, G. Klebe, and E. Hüllermeier, “Evolutionary construction of multiple graph alignments for the structural analysis of biomolecules,” Bioinformatics, vol. 25, no. 16, pp. 2110–2117, 2009.M. Mernberger, G. Klebe, and E. Hullermeier, “SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 8, no. 5, pp. 1330–1343, Sep. 2011.D. Shasha, J. T. L. Wang, and R. Giugno, “Algorithmics and applications of tree and graph searching,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS ’02, 2002, p. 39.R. V. Spriggs, P. J. Artymiuk, and P. Willett, “Searching for Patterns of Amino Acids in 3D Protein Structures,” J. Chem. Inf. Comput. Sci., vol. 43, no. 2, pp. 412–421, Mar. 2003.D. Conte, P. Foggia, C. Sansone, And M. Vento, “Thirty years of graph matching in pattern recognition,
柏林,海德堡:施普林格柏林,海德堡,2011。龚志强,彭志强,“基于Memetic算法的全球生物网络定位”,中国科学院学报(自然科学版)。第一版。医学杂志。Bioinforma。,第13卷,第3期。6, pp. 1117-1129, 2016年11月。M. Caldonazzo Garbelini, A. Y. Kashiwabara和D. S. Sanches,“基于模因算法的序列基序查找器”,BMC生物信息学,第19卷,2018。L. Correa, B. Borguesan, C. Farfan, M. Inostroza-Ponta, M. Dorn,“三维蛋白质结构预测问题的模因算法”,IEEE/ACM Trans。第一版。医学杂志。Bioinforma。, pp. 1-1, 2016。Tran Ngoc, D. Do Duc和H. Hoang Xuan,“一种新的基于蚂蚁的多图对齐算法”,2014年国际先进通信技术会议(ATC 2014), 2014, pp. 181-186。黄洪祥,黄洪涛,“基于蚁群优化的旅行商问题求解:一种新的高效算法”,电子学报。Commun。,第2卷,第2期。3 - 4, 2013年。杜德德,丁洪清,黄轩,“蚁群优化方法的信息素更新规则研究”,2008,pp. 153-160。
{"title":"A new memetic algorithm for multiple graph alignment","authors":"Trần Ngọc Hà, Le Nhu Hien, H. X. Huan","doi":"10.25073/2588-1086/VNUCSCE.194","DOIUrl":"https://doi.org/10.25073/2588-1086/VNUCSCE.194","url":null,"abstract":"One of the main tasks of structural biology is comparing the structure of proteins. Comparisons of protein structure can determine their functional similarities. Multigraph alignment is a useful tool for identifying functional similarities based on structural analysis. This article proposes a new algorithm for aligning protein binding sites called ACOTS-MGA. This algorithm is based on the memetic scheme. It uses the ACO method to construct a set of solutions, then selects the best solution for implementing Tabu Search to improve the solution quality. Experimental results have shown that ACOTS-MGA outperforms state-of-the-art algorithms while producing alignments of better quality.KeywordsMultiple Graph Alignment, Tabu Search, Ant Colony Optimization, local search, memetic algorithm, SMMAS pheromone update rule, protein active sitesReferencesE. Todd, C. A. Orengo, and J. M. Thornton, “Evolution of function in protein superfamilies, from a structural perspective,” J. Mol. Biol., vol. 307, no. 4, pp. 1113–1143, Apr. 2001.S. F. Altschul et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., vol. 25, pp. 3389–3402, 1997.R. C. Edgar, “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res., vol. 32, no. 5, pp. 1792–1797, Mar. 2004.J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Res., vol. 22, no. 22, pp. 4673–4680, Nov. 1994.M. Larkin, G. Blackshields, N. Brown, … R. C.-, and undefined 2007, “Clustal W and Clustal X version 2.0,” academic.oup.com.C. Notredame, D. G. Higgins, and J. Heringa, “T-coffee: a novel method for fast and accurate multiple sequence alignment,” J. Mol. Biol., vol. 302, no. 1, pp. 205–217, Sep. 2000.K. Sjolander, “Phylogenomic inference of protein molecular function: advances and challenges,” Bioinformatics, vol. 20, no. 2, pp. 170–179, Jan. 2004.T. Fober, M. Mernberger, G. Klebe, and E. Hüllermeier, “Evolutionary construction of multiple graph alignments for the structural analysis of biomolecules,” Bioinformatics, vol. 25, no. 16, pp. 2110–2117, 2009.M. Mernberger, G. Klebe, and E. Hullermeier, “SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 8, no. 5, pp. 1330–1343, Sep. 2011.D. Shasha, J. T. L. Wang, and R. Giugno, “Algorithmics and applications of tree and graph searching,” in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS ’02, 2002, p. 39.R. V. Spriggs, P. J. Artymiuk, and P. Willett, “Searching for Patterns of Amino Acids in 3D Protein Structures,” J. Chem. Inf. Comput. Sci., vol. 43, no. 2, pp. 412–421, Mar. 2003.D. Conte, P. Foggia, C. Sansone, And M. Vento, “Thirty years of graph matching in pattern recognition,","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"369 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122343677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.25073/2588-1086/vnucsce.293
Tran Quang Vinh, Nguyen Thi Ngoc Diep
Table is one of the most common ways to represent structured data in documents. Existing researches on image-based table structure recognition often rely on limited datasets with the largest amount of 3,789 human-labeled tables as ICDAR 19 Track B dataset. A recent TableBank dataset for table structures contains 145K tables, however, the tables are labeled in an HTML tag sequence format, which impedes the development of image-based recognition methods. In this paper, we propose several processing methods that automatically convert an HTML tag sequence annotation into bounding box annotation for table cells in one table image. By ensembling these methods, we could convert 42,028 tables with high correctness, which is 11 times larger than the largest existing dataset (ICDAR 19). We then demonstrate that using these bounding box annotations, a straightforward representation of objects in images, we can achieve much higher F1-scores of table structure recognition at many high IoU thresholds using only off-the-shelf deep learning models: F1-score of 0.66 compared to the state-of-the-art of 0.44 for ICDAR19 dataset. A further experiment on using explicit bounding box annotation for image-based table structure recognition results in higher accuracy (70.6%) than implicit text sequence annotation (only 33.8%). The experiments show the effectiveness of our largest-to-date dataset to open up opportunities to generalize on real-world applications. Our dataset and experimental models are publicly available at shorturl.at/hwHY3
{"title":"Automatic building of a large and complete dataset for image-based table structure recognition","authors":"Tran Quang Vinh, Nguyen Thi Ngoc Diep","doi":"10.25073/2588-1086/vnucsce.293","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.293","url":null,"abstract":"Table is one of the most common ways to represent structured data in documents. Existing researches on image-based table structure recognition often rely on limited datasets with the largest amount of 3,789 human-labeled tables as ICDAR 19 Track B dataset. A recent TableBank dataset for table structures contains 145K tables, however, the tables are labeled in an HTML tag sequence format, which impedes the development of image-based recognition methods. In this paper, we propose several processing methods that automatically convert an HTML tag sequence annotation into bounding box annotation for table cells in one table image. By ensembling these methods, we could convert 42,028 tables with high correctness, which is 11 times larger than the largest existing dataset (ICDAR 19). We then demonstrate that using these bounding box annotations, a straightforward representation of objects in images, we can achieve much higher F1-scores of table structure recognition at many high IoU thresholds using only off-the-shelf deep learning models: F1-score of 0.66 compared to the state-of-the-art of 0.44 for ICDAR19 dataset. A further experiment on using explicit bounding box annotation for image-based table structure recognition results in higher accuracy (70.6%) than implicit text sequence annotation (only 33.8%). The experiments show the effectiveness of our largest-to-date dataset to open up opportunities to generalize on real-world applications. Our dataset and experimental models are publicly available at shorturl.at/hwHY3","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"455 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}