Kashif Mehboob Khan, Warda Haider, N. A. Khan, Darakhshan Saleem
The amount of data is increasing rapidly as more and more devices are being linked to the Internet. Big data has a variety of uses and benefits, but it also has numerous challenges associated with it that are required to be resolved to raise the caliber of available services, including data integrity and security, analytics, acumen, and organization of Big data. While actively seeking the best way to manage, systemize, integrate, and affix Big data, we concluded that blockchain methodology contributes significantly. Its presented approaches for decentralized data management, digital property reconciliation, and internet of things data interchange have a massive impact on how Big data will advance. Unauthorized access to the data is very challenging due to the ciphered and decentralized data preservation in the blockchain network. This paper proposes insights related to specific Big data applications that can be analyzed by machine learning algorithms, driven by data provenance, and coupled with blockchain technology to increase data trustworthiness by giving interference-resistant information associated with the lineage and chronology of data records. The scenario of record tampering and big data provenance has been illustrated here using a diabetes prediction. The study carries out an empirical analysis on hundreds of patient records to perform the evaluation and to observe the impact of tampered records on big data analysis i.e diabetes model prediction. Through our experimentation, we may infer that under our blockchain-based system the unchangeable and tamper-proof metadata connected to the source and evolution of records produced verifiability to acquired data and thus high accuracy to our diabetes prediction model.
{"title":"Big Data Provenance Using Blockchain for Qualitative Analytics via Machine Learning","authors":"Kashif Mehboob Khan, Warda Haider, N. A. Khan, Darakhshan Saleem","doi":"10.3897/jucs.93533","DOIUrl":"https://doi.org/10.3897/jucs.93533","url":null,"abstract":"The amount of data is increasing rapidly as more and more devices are being linked to the Internet. Big data has a variety of uses and benefits, but it also has numerous challenges associated with it that are required to be resolved to raise the caliber of available services, including data integrity and security, analytics, acumen, and organization of Big data. While actively seeking the best way to manage, systemize, integrate, and affix Big data, we concluded that blockchain methodology contributes significantly. Its presented approaches for decentralized data management, digital property reconciliation, and internet of things data interchange have a massive impact on how Big data will advance. Unauthorized access to the data is very challenging due to the ciphered and decentralized data preservation in the blockchain network. This paper proposes insights related to specific Big data applications that can be analyzed by machine learning algorithms, driven by data provenance, and coupled with blockchain technology to increase data trustworthiness by giving interference-resistant information associated with the lineage and chronology of data records. The scenario of record tampering and big data provenance has been illustrated here using a diabetes prediction. The study carries out an empirical analysis on hundreds of patient records to perform the evaluation and to observe the impact of tampered records on big data analysis i.e diabetes model prediction. Through our experimentation, we may infer that under our blockchain-based system the unchangeable and tamper-proof metadata connected to the source and evolution of records produced verifiability to acquired data and thus high accuracy to our diabetes prediction model. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"30 1","pages":"446-469"},"PeriodicalIF":0.0,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78240578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several real-world phenomena, including social, communication, transportation, and biological networks, can be efficiently expressed as graphs. This enables the deployment of graph algorithms to infer information from such complex network interactions to enhance graph applications’ accuracy, including link prediction, node classification, and clustering. However, the large size and complexity of the network data limit the efficiency of the learning algorithms in making decisions from such graph datasets. To overcome these limitations, graph embedding techniques are usually adopted. However, many studies not only assume static networks but also pay less attention to preserving the network topological and centrality information, which information is key in analyzing networks. In order to fill these gaps, we propose a novel end-to-end unified Topological Similarity and Centrality driven Hybrid Deep Learning model for Temporal Link Prediction (TSC-TLP). First, we extract topological similarity and centrality-based features from the raw networks. Next, we systematically aggregate these topological and centrality features to act as inputs for the encoder. In addition, we leverage the long short-term memory (LSTM) layer to learn the underlying temporal information in the graph snapshots. Lastly, we impose topological similarity and centrality constraints on the model learning to enforce preserving of topological structure and node centrality role of the input graphs in the learned embeddings. The proposed TSC-TLP is tested on 3 real-world temporal social networks. On average, it exhibits a 4% improvement in link prediction accuracy and a 37% reduction in MSE on centrality prediction over the best benchmark.
{"title":"Topological Similarity and Centrality Driven Hybrid Deep Learning for Temporal Link Prediction","authors":"Abubakhari Sserwadda, Alper Ozcan, Y. Yaslan","doi":"10.3897/jucs.99169","DOIUrl":"https://doi.org/10.3897/jucs.99169","url":null,"abstract":"Several real-world phenomena, including social, communication, transportation, and biological networks, can be efficiently expressed as graphs. This enables the deployment of graph algorithms to infer information from such complex network interactions to enhance graph applications’ accuracy, including link prediction, node classification, and clustering. However, the large size and complexity of the network data limit the efficiency of the learning algorithms in making decisions from such graph datasets. To overcome these limitations, graph embedding techniques are usually adopted. However, many studies not only assume static networks but also pay less attention to preserving the network topological and centrality information, which information is key in analyzing networks. In order to fill these gaps, we propose a novel end-to-end unified Topological Similarity and Centrality driven Hybrid Deep Learning model for Temporal Link Prediction (TSC-TLP). First, we extract topological similarity and centrality-based features from the raw networks. Next, we systematically aggregate these topological and centrality features to act as inputs for the encoder. In addition, we leverage the long short-term memory (LSTM) layer to learn the underlying temporal information in the graph snapshots. Lastly, we impose topological similarity and centrality constraints on the model learning to enforce preserving of topological structure and node centrality role of the input graphs in the learned embeddings. The proposed TSC-TLP is tested on 3 real-world temporal social networks. On average, it exhibits a 4% improvement in link prediction accuracy and a 37% reduction in MSE on centrality prediction over the best benchmark.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"10 1","pages":"470-490"},"PeriodicalIF":0.0,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78546791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyedeh Mahsa Mirhoseini-Moghaddam, Mohammad Reza Yamaghani, A. Bakhshipour
In this study, a combined system of electronic nose (e-nose) and computer vision was developed for the detection of adulteration in extra virgin olive oil (EVOO). The canola oil was blended with the pure EVOO to provide adulterations at four levels of 5, 10, 15, and 20%. Data collection was carried out using an e-nose system containing 13 metal oxide gas sensors, and a computer vision system. Applying principal component analysis (PCA) on the e-nose-extracted features showed that 93% and 92% of total data variance was covered by the three first PCs generated from Maximum Sensor Response (MSR), Area Under Curve (AUC) features, respectively. Cluster analysis verified that the pure and impure EVOO samples can be categorized by e-nose properties. PCA-Quadratic Discriminant Analysis (PCA-QDA) classified the EVOOs with an accuracy of 100%. Multiple Linear Regression (MLR) was able to estimate the adulteration percentage with the R2 of 0.8565 and RMSE of 2.7125 on the validation dataset. Moreover, factor analysis using Partial Least Square (PLS) introduced the MQ3 and TGS2620 sensors as the most important e-nose sensors for EVOO adulteration monitoring. Application of Response Surface Methodology (RSM) on RGB, HSV, L*,a*, and b* as color parameters of the EVOO images revealed that the color parameters are at their optimal state in the case up to 0.1% of canola impurity, where the obtained desirability index was 97%. Results of this study demonstrated the high capability of e-nose and computer vision systems for accurate, fast and non-destructive detection of adulteration in EVOO and detection of food adulteration may be more reliable using these artificial senses.
{"title":"Application of Electronic Nose and Eye Systems for Detection of Adulteration in Olive Oil based on Chemometrics and Optimization Approaches","authors":"Seyedeh Mahsa Mirhoseini-Moghaddam, Mohammad Reza Yamaghani, A. Bakhshipour","doi":"10.3897/jucs.90346","DOIUrl":"https://doi.org/10.3897/jucs.90346","url":null,"abstract":"In this study, a combined system of electronic nose (e-nose) and computer vision was developed for the detection of adulteration in extra virgin olive oil (EVOO). The canola oil was blended with the pure EVOO to provide adulterations at four levels of 5, 10, 15, and 20%. Data collection was carried out using an e-nose system containing 13 metal oxide gas sensors, and a computer vision system. Applying principal component analysis (PCA) on the e-nose-extracted features showed that 93% and 92% of total data variance was covered by the three first PCs generated from Maximum Sensor Response (MSR), Area Under Curve (AUC) features, respectively. Cluster analysis verified that the pure and impure EVOO samples can be categorized by e-nose properties. PCA-Quadratic Discriminant Analysis (PCA-QDA) classified the EVOOs with an accuracy of 100%. Multiple Linear Regression (MLR) was able to estimate the adulteration percentage with the R2 of 0.8565 and RMSE of 2.7125 on the validation dataset. Moreover, factor analysis using Partial Least Square (PLS) introduced the MQ3 and TGS2620 sensors as the most important e-nose sensors for EVOO adulteration monitoring. Application of Response Surface Methodology (RSM) on RGB, HSV, L*,a*, and b* as color parameters of the EVOO images revealed that the color parameters are at their optimal state in the case up to 0.1% of canola impurity, where the obtained desirability index was 97%. Results of this study demonstrated the high capability of e-nose and computer vision systems for accurate, fast and non-destructive detection of adulteration in EVOO and detection of food adulteration may be more reliable using these artificial senses. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"1 1","pages":"300-325"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73358895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rising use of Internet of Things (IoT)-enabled devices, there is a significant increase in the use of smart applications that provide their response in real time. This rising demand imposes many issues such as scheduling, cost, overloading of servers, etc. To overcome these, a cost-effective scheduling technique has been proposed for the allocation of smart applications. The aim of this paper is to provide better profit by the Fog environment and minimize the cost of smart applications from the user end. The proposed framework has been evaluated with the help of a test bed containing four analysis phases and is compared on the basis of five metrics- average allocation time, average profit by the Fog environment, average cost of smart applications, resource utilization and number of applications run within given latency. The proposed framework performs better under all the provided metrics.
{"title":"Cost-Effective Scheduling in Fog Computing: An Environment Based on Modified PROMETHEE Technique","authors":"Shefali Varshney, Rajinder Sandhu, P. K. Gupta","doi":"10.3897/jucs.90429","DOIUrl":"https://doi.org/10.3897/jucs.90429","url":null,"abstract":"With the rising use of Internet of Things (IoT)-enabled devices, there is a significant increase in the use of smart applications that provide their response in real time. This rising demand imposes many issues such as scheduling, cost, overloading of servers, etc. To overcome these, a cost-effective scheduling technique has been proposed for the allocation of smart applications. The aim of this paper is to provide better profit by the Fog environment and minimize the cost of smart applications from the user end. The proposed framework has been evaluated with the help of a test bed containing four analysis phases and is compared on the basis of five metrics- average allocation time, average profit by the Fog environment, average cost of smart applications, resource utilization and number of applications run within given latency. The proposed framework performs better under all the provided metrics. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"7 1","pages":"397-416"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90191301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting human mobility is a key element in the development of intelligent transport systems. Current digital technologies enable capturing a wealth of data on mobility flows between geographic areas, which are then used to train machine learning models to predict these flows. However, most works have only considered a single data source for building these models or different sources but covering the same spatial area. In this paper we propose to augment a macro open-data mobility study based on cellular phones with data from a road traffic sensor located within a specific motorway of one of the mobility areas in the study. The results show that models trained with the fusion of both types of data, especially long short-term memory (LSTM) and Gated Recurrent Unit (GRU) neural networks, provide a more reliable prediction than models based only on the open data source. These results show that it is possible to predict the traffic entering a particular city in the next 30 minutes with an absolute error less than 10%. Thus, this work is a further step towards improving the prediction of human mobility in interurban areas by fusing open data with data from IoT systems.
{"title":"Human Mobility Prediction with Region-based Flows and Road Traffic Data","authors":"Fernando Terroso-Sáenz, Andrés Muñoz","doi":"10.3897/jucs.94514","DOIUrl":"https://doi.org/10.3897/jucs.94514","url":null,"abstract":"Predicting human mobility is a key element in the development of intelligent transport systems. Current digital technologies enable capturing a wealth of data on mobility flows between geographic areas, which are then used to train machine learning models to predict these flows. However, most works have only considered a single data source for building these models or different sources but covering the same spatial area. In this paper we propose to augment a macro open-data mobility study based on cellular phones with data from a road traffic sensor located within a specific motorway of one of the mobility areas in the study. The results show that models trained with the fusion of both types of data, especially long short-term memory (LSTM) and Gated Recurrent Unit (GRU) neural networks, provide a more reliable prediction than models based only on the open data source. These results show that it is possible to predict the traffic entering a particular city in the next 30 minutes with an absolute error less than 10%. Thus, this work is a further step towards improving the prediction of human mobility in interurban areas by fusing open data with data from IoT systems.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"14 1","pages":"374-396"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87147768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahimeh Ramazankhani, Mahdi Yazdian Dehkordi, M. Rezaeian
Features extracted from facial images are used in various fields such as kinship verification. The kinship verification system determines the kin or non-kin relation between a pair of facial images by analysing their facial features. In this research, different texture and color features have been used along with the metric learning method, to verify the kinship for the four kinship relations of father-son, father-daughter, mother-son and mother-daughter. First, by fusing effective features, NRML metric learning used to generate the discriminative feature vector, then SVM classifier used to verify to kinship relations. To measure the accuracy of the proposed method, KinFaceW-I and KinFaceW-II databases have been used. The results of the evaluations show that the feature fusion and NRML metric learning methods have been able to improve the performance of the kinship verification system. In addition to the proposed approach, the effect of feature extraction from the image blocks or the whole image is investigated and the results are presented. The results indicate that feature extraction in block form, can be effective in improving the final accuracy of kinship verification.
{"title":"Feature Fusion and NRML Metric Learning for Facial Kinship Verification","authors":"Fahimeh Ramazankhani, Mahdi Yazdian Dehkordi, M. Rezaeian","doi":"10.3897/jucs.89254","DOIUrl":"https://doi.org/10.3897/jucs.89254","url":null,"abstract":"Features extracted from facial images are used in various fields such as kinship verification. The kinship verification system determines the kin or non-kin relation between a pair of facial images by analysing their facial features. In this research, different texture and color features have been used along with the metric learning method, to verify the kinship for the four kinship relations of father-son, father-daughter, mother-son and mother-daughter. First, by fusing effective features, NRML metric learning used to generate the discriminative feature vector, then SVM classifier used to verify to kinship relations. To measure the accuracy of the proposed method, KinFaceW-I and KinFaceW-II databases have been used. The results of the evaluations show that the feature fusion and NRML metric learning methods have been able to improve the performance of the kinship verification system. In addition to the proposed approach, the effect of feature extraction from the image blocks or the whole image is investigated and the results are presented. The results indicate that feature extraction in block form, can be effective in improving the final accuracy of kinship verification.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"14 1","pages":"326-348"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86696440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe
International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.
{"title":"Automatic assignment of diagnosis codes to free-form text medical note","authors":"Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe","doi":"10.3897/jucs.89923","DOIUrl":"https://doi.org/10.3897/jucs.89923","url":null,"abstract":"International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"106 1","pages":"349-373"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81076947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this paper is to present an undergraduate research experience process model and the evaluation of seven years of its application in an undergraduate research program in software engineering. Undergraduate students who participated in research projects between 2015 and 2022 were surveyed to find out a) their motivations for participating in research projects in software engineering, b) the skills they consider they have acquired or improved by participating in those projects, and c) their perception of benefits and utility for their future work and professional activities. Results reveal that participation in real research projects in software engineering is highly valued by undergraduate students, who perceive benefits in the development of research and soft skills, and for their future professional activity. In addition, these undergraduate research projects and the process followed show that it is feasible to make original contributions to the body of knowledge of software engineering.
{"title":"Undergraduate research in software engineering. An experience and evaluation report","authors":"Gerardo Matturro","doi":"10.3897/jucs.95718","DOIUrl":"https://doi.org/10.3897/jucs.95718","url":null,"abstract":"The purpose of this paper is to present an undergraduate research experience process model and the evaluation of seven years of its application in an undergraduate research program in software engineering. Undergraduate students who participated in research projects between 2015 and 2022 were surveyed to find out a) their motivations for participating in research projects in software engineering, b) the skills they consider they have acquired or improved by participating in those projects, and c) their perception of benefits and utility for their future work and professional activities. Results reveal that participation in real research projects in software engineering is highly valued by undergraduate students, who perceive benefits in the development of research and soft skills, and for their future professional activity. In addition, these undergraduate research projects and the process followed show that it is feasible to make original contributions to the body of knowledge of software engineering.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"29 1","pages":"203-221"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74948532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly.
{"title":"Leveraging Structural and Semantic Measures for JSON Document Clustering","authors":"Uma Priya D, P. S. Thilagam","doi":"10.3897/jucs.86563","DOIUrl":"https://doi.org/10.3897/jucs.86563","url":null,"abstract":"In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"27 1","pages":"222-241"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81175532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shashi Kant Shankar, Adolfo Ruiz-Calleja, L. Prieto, M. Rodríguez-Triana, Pankaj Chejara, Sandesh Tripathi
Multimodal Learning Analytics (MMLA) solutions aim to provide a more holistic picture of a learning situation by processing multimodal educational data. Considering contextual information of a learning situation is known to help in providing more relevant outputs to educational stakeholders. However, most of the MMLA solutions are still in prototyping phase and dealing with different dimensions of an authentic MMLA situation that involve multiple cross-disciplinary stakeholders like teachers, researchers, and developers. One of the reasons behind still being in prototyping phase of the development lifecycle is related to the challenges that software developers face at different levels in developing context-aware MMLA solutions. In this paper, we identify the requirements and propose a data infrastructure called CIMLA. It includes different data processing components following a standard data processing pipeline and considers contextual information following a data structure. It has been evaluated in three authentic MMLA scenarios involving different cross-disciplinary stakeholders following the Software Architecture Analysis Method. Its fitness was analyzed in each of the three scenarios and developers were interviewed to assess whether it meets functional and non-functional requirements. Results showed that CIMLA supports modularity in developing context-aware MMLA solutions and each of its modules can be reused with required modifications in the development of other solutions. In the future, the current involvement of a developer in customizing the configuration file to consider contextual information can be investigated.
{"title":"CIMLA: A Modular and Modifiable Data Preparation, Organization, and Fusion Infrastructure to Partially Support the Development of Context-aware MMLA Solutions","authors":"Shashi Kant Shankar, Adolfo Ruiz-Calleja, L. Prieto, M. Rodríguez-Triana, Pankaj Chejara, Sandesh Tripathi","doi":"10.3897/jucs.84558","DOIUrl":"https://doi.org/10.3897/jucs.84558","url":null,"abstract":"Multimodal Learning Analytics (MMLA) solutions aim to provide a more holistic picture of a learning situation by processing multimodal educational data. Considering contextual information of a learning situation is known to help in providing more relevant outputs to educational stakeholders. However, most of the MMLA solutions are still in prototyping phase and dealing with different dimensions of an authentic MMLA situation that involve multiple cross-disciplinary stakeholders like teachers, researchers, and developers. One of the reasons behind still being in prototyping phase of the development lifecycle is related to the challenges that software developers face at different levels in developing context-aware MMLA solutions. In this paper, we identify the requirements and propose a data infrastructure called CIMLA. It includes different data processing components following a standard data processing pipeline and considers contextual information following a data structure. It has been evaluated in three authentic MMLA scenarios involving different cross-disciplinary stakeholders following the Software Architecture Analysis Method. Its fitness was analyzed in each of the three scenarios and developers were interviewed to assess whether it meets functional and non-functional requirements. Results showed that CIMLA supports modularity in developing context-aware MMLA solutions and each of its modules can be reused with required modifications in the development of other solutions. In the future, the current involvement of a developer in customizing the configuration file to consider contextual information can be investigated.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"25 1","pages":"265-297"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82094807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}