Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642126
Hoai Viet Nguyen, Linh Doan Bao, Hoang Viet Trinh, Hoang Huy Phan, Ta Minh Thanh
The Mobile capture receipts Optical Character Recognition (MC-OCR) [14] challenge deliver two tasks: Receipt Image Quality Evaluation and Key Information Extraction. In the first task, we introduce a regression model to map various inputs, for instance the probability of the output OCR, cropped text boxes, images to actual label. In the second task, we propose a stacked multi-model as a solution to solve this problem. The robust models are incorporated by image segmentation, image classification, text detection, text recognition, and text classification. Follow this solution, we can get vital tackle various noise receipt types: horizontal, skew, and blur receipt.
{"title":"MC-OCR Challenge 2021: Towards Document Understanding for Unconstrained Mobile-Captured Vietnamese Receipts","authors":"Hoai Viet Nguyen, Linh Doan Bao, Hoang Viet Trinh, Hoang Huy Phan, Ta Minh Thanh","doi":"10.1109/RIVF51545.2021.9642126","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642126","url":null,"abstract":"The Mobile capture receipts Optical Character Recognition (MC-OCR) [14] challenge deliver two tasks: Receipt Image Quality Evaluation and Key Information Extraction. In the first task, we introduce a regression model to map various inputs, for instance the probability of the output OCR, cropped text boxes, images to actual label. In the second task, we propose a stacked multi-model as a solution to solve this problem. The robust models are incorporated by image segmentation, image classification, text detection, text recognition, and text classification. Follow this solution, we can get vital tackle various noise receipt types: horizontal, skew, and blur receipt.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"105 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75689532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642085
Phuong Thi Nha, Ta Minh Thanh
In recent years, protecting copyright of digital images is an indispensable requirement for owners. To against with rapidly increasing of attacks, many techniques have been proposed in transform domain for ensuring quality of watermarked image, robustness of extracted watermark and execution time. Among these techniques, LU decomposition is considered as an outstanding technique in term of computation. However, it is that not all square matrices have an LU decomposition. Therefore, the suitable blocks need to be chosen before factorizing pixel matrices into lower and upper triangular matrix. In addition, in order to improve the invisibility of watermarked image, watermark should be embedded on one element of L matrix instead of two elements as the previous proposals. In this paper, we propose a novel image watermarking scheme which is based on strategy of LU blocks selection and an improved embedding method. Beside that, the extraction time is significantly sped up by a new solution to get out L(2,1) and L(3,1) elements of L matrix without performing LU decomposition in the extracting stage. According to the experimental results, our proposed method not only has the much better visual quality of watermarked images, but also can effectively extracts the watermark under some attacks.
{"title":"A Novel Image Watermarking Scheme Using LU Decomposition","authors":"Phuong Thi Nha, Ta Minh Thanh","doi":"10.1109/RIVF51545.2021.9642085","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642085","url":null,"abstract":"In recent years, protecting copyright of digital images is an indispensable requirement for owners. To against with rapidly increasing of attacks, many techniques have been proposed in transform domain for ensuring quality of watermarked image, robustness of extracted watermark and execution time. Among these techniques, LU decomposition is considered as an outstanding technique in term of computation. However, it is that not all square matrices have an LU decomposition. Therefore, the suitable blocks need to be chosen before factorizing pixel matrices into lower and upper triangular matrix. In addition, in order to improve the invisibility of watermarked image, watermark should be embedded on one element of L matrix instead of two elements as the previous proposals. In this paper, we propose a novel image watermarking scheme which is based on strategy of LU blocks selection and an improved embedding method. Beside that, the extraction time is significantly sped up by a new solution to get out L(2,1) and L(3,1) elements of L matrix without performing LU decomposition in the extracting stage. According to the experimental results, our proposed method not only has the much better visual quality of watermarked images, but also can effectively extracts the watermark under some attacks.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73029750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642099
P. H. Pham, Bich-Ngan T. Nguyen, Canh V. Pham, Nghia D. Nghia, V. Snás̃el
In the context of viral marketing in Online Social Networks (OSNs), companies often find some users (called a seed set) to initiate the spread of their product’s information so that the benefit gained exceeds a given threshold. However, in a realistic scenario, marketing strategies often change so the selection of a seed set for a particular threshold is not enough to provide an effective solution. Motivated by this phenomenon, we investigate the Multiple Benefit Thresholds (MBT), defined as follows: Given a social network under an information diffusion and a set of thresholds T = {T1, T2, … , Tk}, the problem finds seed sets S1, S2, … , Sk with the minimal cost so that their benefit gained after the influence process are at least T1, T2, … , Tk, respectively. To find the solution, we propose an efficient algorithm with theoretical guarantees, named Efficient Sampling for Selecting Multiple seed sets (ESSM) by developing an algorithmic framework and utilizing the sampling technique for estimating the objective function. We perform extensive experiments using some real networks show that the effective and performance of our algorithm, which outperforms other algorithms in term both the cost and running time.
{"title":"Efficient Algorithm for Multiple Benefit Thresholds Problem in Online Social Networks","authors":"P. H. Pham, Bich-Ngan T. Nguyen, Canh V. Pham, Nghia D. Nghia, V. Snás̃el","doi":"10.1109/RIVF51545.2021.9642099","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642099","url":null,"abstract":"In the context of viral marketing in Online Social Networks (OSNs), companies often find some users (called a seed set) to initiate the spread of their product’s information so that the benefit gained exceeds a given threshold. However, in a realistic scenario, marketing strategies often change so the selection of a seed set for a particular threshold is not enough to provide an effective solution. Motivated by this phenomenon, we investigate the Multiple Benefit Thresholds (MBT), defined as follows: Given a social network under an information diffusion and a set of thresholds T = {T1, T2, … , Tk}, the problem finds seed sets S1, S2, … , Sk with the minimal cost so that their benefit gained after the influence process are at least T1, T2, … , Tk, respectively. To find the solution, we propose an efficient algorithm with theoretical guarantees, named Efficient Sampling for Selecting Multiple seed sets (ESSM) by developing an algorithmic framework and utilizing the sampling technique for estimating the objective function. We perform extensive experiments using some real networks show that the effective and performance of our algorithm, which outperforms other algorithms in term both the cost and running time.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85196814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642088
Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, T. Nguyen, Phi-Le Nguyen, Thi-Lan Le
Mobile captured receipts OCR (MC-OCR) recognizes text from structured and semi-structured receipts and invoices captured by mobile devices. This process plays a critical role in streamlining document-intensive processes and office automation in many financial, accounting, and taxation areas. Although many efforts have been devoted, MC-OCR still faces significant challenges due to mobile captured images’ complexity. First, receipts might be crumpled, or the content might be blurred. Second, different from scanned images, the quality of photos taken by mobile devices shows high diversity due to the light condition and the dynamic environment (e.g., indoor, out-door, complex background, etc.) where the receipts were captured. These difficulties lead to a low accuracy of the recognition results. In this challenge, we target two tasks to address these issues, including (1) evaluating the quality of the captured receipts, and (2) recognizing required fields of the receipts. Our idea is to leverage a multi-modal approach which can take advantage of both areas: computer vision and natural language processing, two of the main interests of the RIVF community. The paper presents the BK-OCR team’s methodology and results in the Mobile-Captured Image Document Recognition for Vietnamese Receipts 2021.
{"title":"MC-OCR Challenge 2021: A Multi-modal Approach for Mobile-Captured Vietnamese Receipts Recognition","authors":"Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, T. Nguyen, Phi-Le Nguyen, Thi-Lan Le","doi":"10.1109/RIVF51545.2021.9642088","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642088","url":null,"abstract":"Mobile captured receipts OCR (MC-OCR) recognizes text from structured and semi-structured receipts and invoices captured by mobile devices. This process plays a critical role in streamlining document-intensive processes and office automation in many financial, accounting, and taxation areas. Although many efforts have been devoted, MC-OCR still faces significant challenges due to mobile captured images’ complexity. First, receipts might be crumpled, or the content might be blurred. Second, different from scanned images, the quality of photos taken by mobile devices shows high diversity due to the light condition and the dynamic environment (e.g., indoor, out-door, complex background, etc.) where the receipts were captured. These difficulties lead to a low accuracy of the recognition results. In this challenge, we target two tasks to address these issues, including (1) evaluating the quality of the captured receipts, and (2) recognizing required fields of the receipts. Our idea is to leverage a multi-modal approach which can take advantage of both areas: computer vision and natural language processing, two of the main interests of the RIVF community. The paper presents the BK-OCR team’s methodology and results in the Mobile-Captured Image Document Recognition for Vietnamese Receipts 2021.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84344788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642129
T. Phung, Thi Hong Thu Ma, Van Truong Nguyen, Duc-Quang Vu
Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A new approach, such as self-supervised learning of which labels can be automatically generated, is essential. Therefore, using self- supervised learning is a new approach to state-of-the-art methods. In this paper, we introduce a new self-supervised method namely video denoising. This method requires an autoencoder model to restore original videos. The second model is proposed, which is called the discriminator. It is used for the quality evaluation of output videos from the autoencoder. By reconstructing videos, the autoencoder is learned both spatial and temporal relations of video frames to process the downstream task easily. In the experiments, we have demonstrated that our model is well transferred to the action recognition task and outperforms state- of-the-art methods on the UCF-101 and HMDB-51 datasets.
{"title":"Self-Supervised Learning for Action Recognition by Video Denoising","authors":"T. Phung, Thi Hong Thu Ma, Van Truong Nguyen, Duc-Quang Vu","doi":"10.1109/RIVF51545.2021.9642129","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642129","url":null,"abstract":"Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A new approach, such as self-supervised learning of which labels can be automatically generated, is essential. Therefore, using self- supervised learning is a new approach to state-of-the-art methods. In this paper, we introduce a new self-supervised method namely video denoising. This method requires an autoencoder model to restore original videos. The second model is proposed, which is called the discriminator. It is used for the quality evaluation of output videos from the autoencoder. By reconstructing videos, the autoencoder is learned both spatial and temporal relations of video frames to process the downstream task easily. In the experiments, we have demonstrated that our model is well transferred to the action recognition task and outperforms state- of-the-art methods on the UCF-101 and HMDB-51 datasets.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"46 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87113712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642122
T. Hoa, Val Randolf M. Madrid, Eliezer A. Albacea
In human activities life, accidental falls are a frequent occurrence. It can happen in children, the elderly, and even adults. Early detection of human falls is the most effective way to avoid the high risk of loss of self-control, death, or injury in humans. This means also reducing the national health system’s cost. Therefore research and development of fall detection and rescue systems are needed. Currently, the fall detection system is mainly based on wearable sensors, ambient, and vision sensors. Each method has certain advantages and limitations. The previous works usually focused on size while the speed was not often considered. Therefore, studies that aim to propose a lightweight model for Fall Detection with less complexity of memory and processing time but having reasonable accuracy are still potential. A 3-dimensional lightweight model has been proposed based on MobileNet architecture for falling detection in this paper.
{"title":"A Lightweight Model for Falling Detection","authors":"T. Hoa, Val Randolf M. Madrid, Eliezer A. Albacea","doi":"10.1109/RIVF51545.2021.9642122","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642122","url":null,"abstract":"In human activities life, accidental falls are a frequent occurrence. It can happen in children, the elderly, and even adults. Early detection of human falls is the most effective way to avoid the high risk of loss of self-control, death, or injury in humans. This means also reducing the national health system’s cost. Therefore research and development of fall detection and rescue systems are needed. Currently, the fall detection system is mainly based on wearable sensors, ambient, and vision sensors. Each method has certain advantages and limitations. The previous works usually focused on size while the speed was not often considered. Therefore, studies that aim to propose a lightweight model for Fall Detection with less complexity of memory and processing time but having reasonable accuracy are still potential. A 3-dimensional lightweight model has been proposed based on MobileNet architecture for falling detection in this paper.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"32 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87144745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642093
Thi Thu Trang Nguyen, Dai Tho Nguyen, Duy Loi Vu
Malware attacks have been among the most serious threats to cyber security in the last decade. Antimalware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method.
{"title":"A Hypercuboid-Based Machine Learning Algorithm for Malware Classification","authors":"Thi Thu Trang Nguyen, Dai Tho Nguyen, Duy Loi Vu","doi":"10.1109/RIVF51545.2021.9642093","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642093","url":null,"abstract":"Malware attacks have been among the most serious threats to cyber security in the last decade. Antimalware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81440105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Speaker verification in noisy environments is still a challenging task. Previous studies have proposed speaker embeddings (x-vectors, ThinResNet) with classifier models (PLDA, cosine) to classify if an audio is spoken by a specific speaker. The verification process is defined in 3 steps: training an embedding extractor, enrollment and verification. Most studies were trying to mitigate the noisy issue by augmenting noises in the embedding extractor. This method helps the extractor to tolerate more types of noise during the inference process. However, the classification model is still sensitive in noisy environments. In this paper, we (1) evaluate the effectiveness of different speaker embedding models and classifiers in various conditions, and (2) propose a neural network classifier on top of embedding vectors and train it with data augmentation. Experimental results indicate that the proposed pipeline outperforms the traditional pipeline by 5% F1 on a clean test set and 9% F1 on noisy test sets.
{"title":"Improving Speaker Verification in Noisy Environment Using DNN Classifier","authors":"Chung Tran Quang, Quang Minh Nguyen, Pham Ngoc Phuong, Quoc Truong Do","doi":"10.1109/RIVF51545.2021.9642074","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642074","url":null,"abstract":"Speaker verification in noisy environments is still a challenging task. Previous studies have proposed speaker embeddings (x-vectors, ThinResNet) with classifier models (PLDA, cosine) to classify if an audio is spoken by a specific speaker. The verification process is defined in 3 steps: training an embedding extractor, enrollment and verification. Most studies were trying to mitigate the noisy issue by augmenting noises in the embedding extractor. This method helps the extractor to tolerate more types of noise during the inference process. However, the classification model is still sensitive in noisy environments. In this paper, we (1) evaluate the effectiveness of different speaker embedding models and classifiers in various conditions, and (2) propose a neural network classifier on top of embedding vectors and train it with data augmentation. Experimental results indicate that the proposed pipeline outperforms the traditional pipeline by 5% F1 on a clean test set and 9% F1 on noisy test sets.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"18 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87342006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/RIVF51545.2021.9642108
T. Do
This paper presents the task of mining a Japanese - Vietnamese parallel text corpus from comparable data resources in application of machine translation. Data resource for this language pair is few and rare so the parallel text should be extracted at multi levels, sentence level and fragment level, to get as much data as possible. Moreover, the proposed method considers word order independently so it can be applied to different language families. The result applied on Japanese- Vietnamese Wikipedia resource shows that the proposed method increases significantly the number of extracted parallel data. The extracted multi-level parallel text contributes to the quality of machine translation as well. More than 144,000 pairs of parallel sentences and 148,000 pairs of parallel fragments had been mined and opened to the research community.
{"title":"Mining Japanese-Vietnamese multi-level parallel text corpus from Wikipedia data resource","authors":"T. Do","doi":"10.1109/RIVF51545.2021.9642108","DOIUrl":"https://doi.org/10.1109/RIVF51545.2021.9642108","url":null,"abstract":"This paper presents the task of mining a Japanese - Vietnamese parallel text corpus from comparable data resources in application of machine translation. Data resource for this language pair is few and rare so the parallel text should be extracted at multi levels, sentence level and fragment level, to get as much data as possible. Moreover, the proposed method considers word order independently so it can be applied to different language families. The result applied on Japanese- Vietnamese Wikipedia resource shows that the proposed method increases significantly the number of extracted parallel data. The extracted multi-level parallel text contributes to the quality of machine translation as well. More than 144,000 pairs of parallel sentences and 148,000 pairs of parallel fragments had been mined and opened to the research community.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"47 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90836282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-19DOI: 10.1109/rivf51545.2021.9642127
{"title":"[Copyright notice]","authors":"","doi":"10.1109/rivf51545.2021.9642127","DOIUrl":"https://doi.org/10.1109/rivf51545.2021.9642127","url":null,"abstract":"","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74586101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}