Ana Fernandes, Margarida Figueiredo, J. Neves, H. Vicente
On May 25, 2018, the General Data Protection Regulation (GDPR) entered into force in the European Union, which is of the utmost importance for monitoring its accomplishment by all organizations, especially those working in the health sector. However, it turns into a very difficult task, i.e., in order to meet this challenge, a practicable problem-solving methodology had to be developed and tested, that lead to a soft approach to computing using Artificial Neural Networks. On the other hand, the method chosen for data collection was the inquiry by questionnaire, in which 156 employees participated. The proposed system has an accuracy of about 90%, which can diagnose the fragility of the laboratory and encourage future improvements to ensure a high level of data protection.
{"title":"An Assessment of Data Guidelines in Cryopreservation Laboratories","authors":"Ana Fernandes, Margarida Figueiredo, J. Neves, H. Vicente","doi":"10.1145/3459104.3459200","DOIUrl":"https://doi.org/10.1145/3459104.3459200","url":null,"abstract":"On May 25, 2018, the General Data Protection Regulation (GDPR) entered into force in the European Union, which is of the utmost importance for monitoring its accomplishment by all organizations, especially those working in the health sector. However, it turns into a very difficult task, i.e., in order to meet this challenge, a practicable problem-solving methodology had to be developed and tested, that lead to a soft approach to computing using Artificial Neural Networks. On the other hand, the method chosen for data collection was the inquiry by questionnaire, in which 156 employees participated. The proposed system has an accuracy of about 90%, which can diagnose the fragility of the laboratory and encourage future improvements to ensure a high level of data protection.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121116770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the influence of helium used as shielding gas in mixture with argon in GTAW of aluminium thin sheets. Pure argon has been used in contrary to argon mixture containing 50% of helium. Different welding speeds and different currents have been varied as welding parameters during welding of EN AW 5754 H22 2 mm thick test coupons. The heat input has been calculated and compared in all welds. Weld geometry has been analysed on macro sections of produced welds, along with the bend testing according to the standard EN ISO 5173:2012. The results show that type of shielding gas interacts weld geometry and structure deformation. The argon-helium mixture had higher thermal conductivity, with the same heat input calculated resulting with different weld geometry. This research can be used for data considering on GTAW of aluminium alloy EN AW 5754 H22.
本文研究了氦气作为保护气体与氩气混合使用对铝薄板GTAW的影响。用纯氩气代替含50%氦气的混合氩气。对ena5754 H22 2mm厚试样焊接过程中不同的焊接速度和不同的焊接电流作为焊接参数进行了研究。对所有焊缝的热输入进行了计算和比较。根据标准EN ISO 5173:2012,对生产焊缝的宏观截面进行了焊缝几何形状分析,并进行了弯曲测试。结果表明,保护气体类型对焊缝几何形状和结构变形有影响。氩氦混合物具有较高的热导率,计算的热输入相同,但焊缝几何形状不同。本研究可为enaw5754h22铝合金的GTAW研究提供数据参考。
{"title":"The Influence of Shielding Gas on GTAW Welding of Aluminium Thin Sheets","authors":"S. Solic, M. Bušić","doi":"10.1145/3459104.3459158","DOIUrl":"https://doi.org/10.1145/3459104.3459158","url":null,"abstract":"This paper investigates the influence of helium used as shielding gas in mixture with argon in GTAW of aluminium thin sheets. Pure argon has been used in contrary to argon mixture containing 50% of helium. Different welding speeds and different currents have been varied as welding parameters during welding of EN AW 5754 H22 2 mm thick test coupons. The heat input has been calculated and compared in all welds. Weld geometry has been analysed on macro sections of produced welds, along with the bend testing according to the standard EN ISO 5173:2012. The results show that type of shielding gas interacts weld geometry and structure deformation. The argon-helium mixture had higher thermal conductivity, with the same heat input calculated resulting with different weld geometry. This research can be used for data considering on GTAW of aluminium alloy EN AW 5754 H22.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126237980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.
{"title":"Probabilistic and Neural Network Based POS Tagging of Ambiguous Nepali text: A Comparative Study","authors":"A. Pradhan, A. Yajnik","doi":"10.1145/3459104.3459146","DOIUrl":"https://doi.org/10.1145/3459104.3459146","url":null,"abstract":"There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115263616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ehsan Hadian Haghighi, Raoul C.Y. Nuijten, Pieter M.E. Van Gorp
Tailoring apps based on user traits has attracted tremendous interest in developing mHealth apps and understanding a users’ personality is a key challenge in that context. This challenge is typically addressed via classic surveys, which pose a regrettably high burden on app users. This study aims to reduce the response burden of personality tests by introducing a model for predicting the user personality based on digital footprints of app usage. At the same time, skipping surveys completely turns out to undermine prediction accuracy. Therefore, this paper conceptualizes a hybrid framework that utilizes user event data in combination with surveys that have fewer questions than conventionally. The proposed method demonstrates a promising trade-off between the simplicity of using user event data and the accuracy of the validated survey methods: when applying the hybrid method to a retrospective case study, the accuracy is higher than when using the event data exclusively. Also, the number of survey questions needed is significantly lower. Since this is a novel method, we expect that results will strengthen as larger data sets available over time.
{"title":"Towards Hybrid Profiling: Combining Digital Phenotyping with Validated Survey Questions to Balance Data Entry Effort with Predictive Power","authors":"Ehsan Hadian Haghighi, Raoul C.Y. Nuijten, Pieter M.E. Van Gorp","doi":"10.1145/3459104.3459199","DOIUrl":"https://doi.org/10.1145/3459104.3459199","url":null,"abstract":"Tailoring apps based on user traits has attracted tremendous interest in developing mHealth apps and understanding a users’ personality is a key challenge in that context. This challenge is typically addressed via classic surveys, which pose a regrettably high burden on app users. This study aims to reduce the response burden of personality tests by introducing a model for predicting the user personality based on digital footprints of app usage. At the same time, skipping surveys completely turns out to undermine prediction accuracy. Therefore, this paper conceptualizes a hybrid framework that utilizes user event data in combination with surveys that have fewer questions than conventionally. The proposed method demonstrates a promising trade-off between the simplicity of using user event data and the accuracy of the validated survey methods: when applying the hybrid method to a retrospective case study, the accuracy is higher than when using the event data exclusively. Also, the number of survey questions needed is significantly lower. Since this is a novel method, we expect that results will strengthen as larger data sets available over time.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115368317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of information technology in the era of big data, the traditional education model based on offline courses can no longer meet people's needs. Online education (i.e., E-Learning) has become a new type of education model. Blockchain has the characteristics of decentralization, traceability, collective maintenance, data disclosure, non-tampering and so on. Its characteristics make it an ideal solution for the existing problems of online education. In this paper, we proposed a new type of decentralized system which called ELMarket(ELM), aiming to solving the problems and deficiencies of existing online educational platforms, including that student user power is insufficient, participants are not interactive enough, and the copyright of educational resources is not protected well. ELM is based on Ethereum to build a private chain called "Intelligence Chain". We released circulating tokens which called ELStone(ELS) on the "Intelligence Chain" as an incentive to users. The system also proposed and widely applied the concept of "Smart Market". In order to protect teachers' creative rights and stimulate students' interests in learning, we have also added several mechanisms such as originality detection, review of users’ behavior and credits exchange. In addition, we combine IPFS and relational database to store, making up for the lack that blockchain cannot store large files. In the end, ELM established a safe, transparent, efficient, and non-tamperable online education ecosystem.
{"title":"ELMarket: An E-Learning System Based on Blockchain","authors":"Dongshuang Guo, W. Shi","doi":"10.1145/3459104.3459202","DOIUrl":"https://doi.org/10.1145/3459104.3459202","url":null,"abstract":"With the development of information technology in the era of big data, the traditional education model based on offline courses can no longer meet people's needs. Online education (i.e., E-Learning) has become a new type of education model. Blockchain has the characteristics of decentralization, traceability, collective maintenance, data disclosure, non-tampering and so on. Its characteristics make it an ideal solution for the existing problems of online education. In this paper, we proposed a new type of decentralized system which called ELMarket(ELM), aiming to solving the problems and deficiencies of existing online educational platforms, including that student user power is insufficient, participants are not interactive enough, and the copyright of educational resources is not protected well. ELM is based on Ethereum to build a private chain called \"Intelligence Chain\". We released circulating tokens which called ELStone(ELS) on the \"Intelligence Chain\" as an incentive to users. The system also proposed and widely applied the concept of \"Smart Market\". In order to protect teachers' creative rights and stimulate students' interests in learning, we have also added several mechanisms such as originality detection, review of users’ behavior and credits exchange. In addition, we combine IPFS and relational database to store, making up for the lack that blockchain cannot store large files. In the end, ELM established a safe, transparent, efficient, and non-tamperable online education ecosystem.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115719546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ecosystem inherent within currently deployed Internet of Things (IoT) systems is that of low-powered devices equipped with sensors that consume data. The data these devices collect is then stored in use-case specific applications, which are connected through application layer gateways that allow these devices to connect to third party cloud storage platforms for further processing. This stratified architecture has created data silos that introduce complexities such as limited user control and lack of solicitation regarding the usage of user data. The constant proliferation of IoT devices deployed in smart cities which include smart university campus (SUC) has resulted in the need for the development of IoT architecture models which are data-centric. In this paper a blockchain- based architecture model, and specifically, the distributed ledger inherent within the Ethereum blockchain, combined with the Proof Of Authority (POA) consensus mechanism, are proposed as a potential solution to developing a proof of concept architecture model that is data-centric. The proposed architecture model will be tested against with application specific use-cases in a simulated environment within the context of a SUC which is subsumed by a smart city.
{"title":"Exploring the Integration of Blockchain Technology and IoT in a Smart University Application Architecture","authors":"Siphamandla Mjoli, N. Dlodlo","doi":"10.1145/3459104.3459153","DOIUrl":"https://doi.org/10.1145/3459104.3459153","url":null,"abstract":"The ecosystem inherent within currently deployed Internet of Things (IoT) systems is that of low-powered devices equipped with sensors that consume data. The data these devices collect is then stored in use-case specific applications, which are connected through application layer gateways that allow these devices to connect to third party cloud storage platforms for further processing. This stratified architecture has created data silos that introduce complexities such as limited user control and lack of solicitation regarding the usage of user data. The constant proliferation of IoT devices deployed in smart cities which include smart university campus (SUC) has resulted in the need for the development of IoT architecture models which are data-centric. In this paper a blockchain- based architecture model, and specifically, the distributed ledger inherent within the Ethereum blockchain, combined with the Proof Of Authority (POA) consensus mechanism, are proposed as a potential solution to developing a proof of concept architecture model that is data-centric. The proposed architecture model will be tested against with application specific use-cases in a simulated environment within the context of a SUC which is subsumed by a smart city.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":" 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the last few years, the car rental businesses around the world are greatly growing. Nowadays, the main problems in renting cars are uncomfortably contracting and inconvenient tracking statuses of cars. In order to tracking cars, we need a system which receives raw data from cars and send them to cloud server real-time. Simultaneously, internet of thing technology is implemented to many industries like medical areas, agriculture and intelligent transportation. Hence, the car sharing business can be excelled by merging IOT technology with a traditional methodology. CoBox is an IOT device which designed to accomplish the tracking system. It can receive and transmit data to cars by connecting with OBD-II port. Monitoring, receiving locations from GPS module, managing power supply and connecting to Firebase server are controlled by using ESP32. With these abilities, the car tracking system can collect needed data and can control lock-unlock door real-time. However, the system only support on Toyota Altis2016 for now, we plan to verify with the other classes of cars in the future.
{"title":"The Development of Car Tracking System for Car Sharing Business by Using OBD-II Port","authors":"Sorawit Sakarin, G. Phanomchoeng","doi":"10.1145/3459104.3459134","DOIUrl":"https://doi.org/10.1145/3459104.3459134","url":null,"abstract":"In the last few years, the car rental businesses around the world are greatly growing. Nowadays, the main problems in renting cars are uncomfortably contracting and inconvenient tracking statuses of cars. In order to tracking cars, we need a system which receives raw data from cars and send them to cloud server real-time. Simultaneously, internet of thing technology is implemented to many industries like medical areas, agriculture and intelligent transportation. Hence, the car sharing business can be excelled by merging IOT technology with a traditional methodology. CoBox is an IOT device which designed to accomplish the tracking system. It can receive and transmit data to cars by connecting with OBD-II port. Monitoring, receiving locations from GPS module, managing power supply and connecting to Firebase server are controlled by using ESP32. With these abilities, the car tracking system can collect needed data and can control lock-unlock door real-time. However, the system only support on Toyota Altis2016 for now, we plan to verify with the other classes of cars in the future.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effectively improving the competitiveness of ISP has always been a hot topic in the field of communication research. Video as the main body of network traffic, providing video users with high satisfaction network services can enhance the user experience, and thus effectively improve the competitiveness of ISP. Some user behaviours will affect the detection of the stalling event. Therefore, this paper proposes a video stalling decision model based on the video stalling detection method based on HTTPS. The model mainly models the IP source, packet length, packet type, and basic information of the packet data. In the case of the unsolved HTTPS protocol, it can use the limited information to identify the user behaviour and perform the stalling detection, thereby reducing the misjudgement rate of the stalling detection.
{"title":"Video Stalling Decision Method","authors":"C. Wang, Na Wang","doi":"10.1145/3459104.3459133","DOIUrl":"https://doi.org/10.1145/3459104.3459133","url":null,"abstract":"Effectively improving the competitiveness of ISP has always been a hot topic in the field of communication research. Video as the main body of network traffic, providing video users with high satisfaction network services can enhance the user experience, and thus effectively improve the competitiveness of ISP. Some user behaviours will affect the detection of the stalling event. Therefore, this paper proposes a video stalling decision model based on the video stalling detection method based on HTTPS. The model mainly models the IP source, packet length, packet type, and basic information of the packet data. In the case of the unsolved HTTPS protocol, it can use the limited information to identify the user behaviour and perform the stalling detection, thereby reducing the misjudgement rate of the stalling detection.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Muneeb, Su-Chong Joo, Gyu-Sung Ham, Kwangman Ko
Edge-Fog Computing is closely related to IoT, 5G, and the blockchain, and various new technologies which are being actively studied in the fields of smart city, machine vision, and smart Industries. Specifically, the rapid growth and dissemination of high-performance sensing technologies with the ability to acquire high-level information, emphasis on the need and importance of further development of platforms like, edge-fog computing for the rapid processing and real-time response of various IoT based applications. In this paper, we introduce the work-in-progress an intelligent edge-to-edge and edge-to-fog collaborative computing platform, shortly called E-BItE, that enables collaborative processing in an edge-fog environment through elasticity and verify communicational functionality, performance and security through blockchain technology for smart city, smart machine-vision and smart Industries applications.
{"title":"An Elastic Blockchain IoT-based Intelligent Edge-Fog Collaboration Computing Platform","authors":"M. Muneeb, Su-Chong Joo, Gyu-Sung Ham, Kwangman Ko","doi":"10.1145/3459104.3459178","DOIUrl":"https://doi.org/10.1145/3459104.3459178","url":null,"abstract":"Edge-Fog Computing is closely related to IoT, 5G, and the blockchain, and various new technologies which are being actively studied in the fields of smart city, machine vision, and smart Industries. Specifically, the rapid growth and dissemination of high-performance sensing technologies with the ability to acquire high-level information, emphasis on the need and importance of further development of platforms like, edge-fog computing for the rapid processing and real-time response of various IoT based applications. In this paper, we introduce the work-in-progress an intelligent edge-to-edge and edge-to-fog collaborative computing platform, shortly called E-BItE, that enables collaborative processing in an edge-fog environment through elasticity and verify communicational functionality, performance and security through blockchain technology for smart city, smart machine-vision and smart Industries applications.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117239866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the need for flexible and instant Natural Language Processing (NLP) pipelines becomes more crucial. The existence of real-time data sources, such as Twitter, necessitates using real-time text analysis platforms. In addition, due to the existence of a wide range of NLP toolkits and libraries in a variety of programming languages, a streaming platform is required to combine and integrate different modules of various NLP toolkits. This study proposes a real-time architecture that uses Apache Storm and Apache Kafka to apply different NLP tasks on streams of textual data. The architecture allows developers to inject NLP modules to it via different programming languages. To compare the performance of the architecture, a series of experiments are conducted to handle OpenNLP, Fasttext, and SpaCy modules for Bahasa Malaysia and English languages. The result shows that Apache Storm achieved the lowest latency, compared with Trident and baseline experiments.
{"title":"Real-time Text Stream Processing: A Dynamic and Distributed NLP Pipeline","authors":"Mohammad Arshi Saloot, D. Pham","doi":"10.1145/3459104.3459198","DOIUrl":"https://doi.org/10.1145/3459104.3459198","url":null,"abstract":"In recent years, the need for flexible and instant Natural Language Processing (NLP) pipelines becomes more crucial. The existence of real-time data sources, such as Twitter, necessitates using real-time text analysis platforms. In addition, due to the existence of a wide range of NLP toolkits and libraries in a variety of programming languages, a streaming platform is required to combine and integrate different modules of various NLP toolkits. This study proposes a real-time architecture that uses Apache Storm and Apache Kafka to apply different NLP tasks on streams of textual data. The architecture allows developers to inject NLP modules to it via different programming languages. To compare the performance of the architecture, a series of experiments are conducted to handle OpenNLP, Fasttext, and SpaCy modules for Bahasa Malaysia and English languages. The result shows that Apache Storm achieved the lowest latency, compared with Trident and baseline experiments.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}