Pub Date : 2023-07-04DOI: 10.48550/arXiv.2307.01568
M. Fahad, J. Darmont
Business Intelligence constitutes a set of methodologies and tools aiming at querying, reporting, on-line analytic processing (OLAP), generating alerts, performing business analytics, etc. When in need to perform these tasks collectively by different collaborators, we need a Collaborative Business Intelligence (CBI) platform. CBI plays a significant role in targeting a common goal among various companies, but it requires them to connect, organize and coordinate with each other to share opportunities, respecting their own autonomy and heterogeneity. This paper presents a CBI platform that hat democratizes data by allowing BI users to easily connect, share and visualize data among collaborators, obtain actionable answers by collaborative analysis, investigate and make collaborative decisions, and also store the analyses along graphical diagrams and charts in a collaborative ontology knowledge base. Our CBI framework supports and assists information sharing, collaborative decision-making and annotation management beyond the boundaries of individuals and enterprises.
{"title":"An Ontology-based Collaborative Business Intelligence Framework","authors":"M. Fahad, J. Darmont","doi":"10.48550/arXiv.2307.01568","DOIUrl":"https://doi.org/10.48550/arXiv.2307.01568","url":null,"abstract":"Business Intelligence constitutes a set of methodologies and tools aiming at querying, reporting, on-line analytic processing (OLAP), generating alerts, performing business analytics, etc. When in need to perform these tasks collectively by different collaborators, we need a Collaborative Business Intelligence (CBI) platform. CBI plays a significant role in targeting a common goal among various companies, but it requires them to connect, organize and coordinate with each other to share opportunities, respecting their own autonomy and heterogeneity. This paper presents a CBI platform that hat democratizes data by allowing BI users to easily connect, share and visualize data among collaborators, obtain actionable answers by collaborative analysis, investigate and make collaborative decisions, and also store the analyses along graphical diagrams and charts in a collaborative ontology knowledge base. Our CBI framework supports and assists information sharing, collaborative decision-making and annotation management beyond the boundaries of individuals and enterprises.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"480-487"},"PeriodicalIF":2.6,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43532241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.48550/arXiv.2305.04796
John Kalung Leung, Igor Griva, W. Kennedy, J. Kinser, Sohyun Park, Seoyoon Lee
This paper presents an innovative approach to address the problems researchers face in Emotion Aware Recommender Systems (EARS): the difficulty and cumbersome collecting voluminously good quality emotion-tagged datasets and an effective way to protect users' emotional data privacy. Without enough good-quality emotion-tagged datasets, researchers cannot conduct repeatable affective computing research in EARS that generates personalized recommendations based on users' emotional preferences. Similarly, if we fail to fully protect users' emotional data privacy, users could resist engaging with EARS services. This paper introduced a method that detects affective features in subjective passages using the Generative Pre-trained Transformer Technology, forming the basis of the Affective Index and Affective Index Indicator (AII). Eliminate the need for users to build an affective feature detection mechanism. The paper advocates for a separation of responsibility approach where users protect their emotional profile data while EARS service providers refrain from retaining or storing it. Service providers can update users' Affective Indices in memory without saving their privacy data, providing Affective Aware recommendations without compromising user privacy. This paper offers a solution to the subjectivity and variability of emotions, data privacy concerns, and evaluation metrics and benchmarks, paving the way for future EARS research.
{"title":"The Application of Affective Measures in Text-based Emotion Aware Recommender Systems","authors":"John Kalung Leung, Igor Griva, W. Kennedy, J. Kinser, Sohyun Park, Seoyoon Lee","doi":"10.48550/arXiv.2305.04796","DOIUrl":"https://doi.org/10.48550/arXiv.2305.04796","url":null,"abstract":"This paper presents an innovative approach to address the problems researchers face in Emotion Aware Recommender Systems (EARS): the difficulty and cumbersome collecting voluminously good quality emotion-tagged datasets and an effective way to protect users' emotional data privacy. Without enough good-quality emotion-tagged datasets, researchers cannot conduct repeatable affective computing research in EARS that generates personalized recommendations based on users' emotional preferences. Similarly, if we fail to fully protect users' emotional data privacy, users could resist engaging with EARS services. This paper introduced a method that detects affective features in subjective passages using the Generative Pre-trained Transformer Technology, forming the basis of the Affective Index and Affective Index Indicator (AII). Eliminate the need for users to build an affective feature detection mechanism. The paper advocates for a separation of responsibility approach where users protect their emotional profile data while EARS service providers refrain from retaining or storing it. Service providers can update users' Affective Indices in memory without saving their privacy data, providing Affective Aware recommendations without compromising user privacy. This paper offers a solution to the subjectivity and variability of emotions, data privacy concerns, and evaluation metrics and benchmarks, paving the way for future EARS research.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"590-597"},"PeriodicalIF":2.6,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43494981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-28DOI: 10.48550/arXiv.2304.14735
Horst Stühler, M. Zöller, Dennis Klau, A. B. Bedrikow, Christian Tutschku
Price forecasting for used construction equipment is a challenging task due to spatial and temporal price fluctuations. It is thus of high interest to automate the forecasting process based on current market data. Even though applying machine learning (ML) to these data represents a promising approach to predict the residual value of certain tools, it is hard to implement for small and medium-sized enterprises due to their insufficient ML expertise. To this end, we demonstrate the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions, which automatically generate the underlying pipelines. We combine AutoML methods with the domain knowledge of the companies. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. To take all complex industrial requirements into account and to demonstrate the applicability of our new approach, we designed a novel metric named method evaluation score, which incorporates the most important technical and non-technical metrics for quality and usability. Based on this metric, we show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts for innovative small and medium-sized enterprises which are interested in conducting such solutions.
{"title":"Benchmarking Automated Machine Learning Methods for Price Forecasting Applications","authors":"Horst Stühler, M. Zöller, Dennis Klau, A. B. Bedrikow, Christian Tutschku","doi":"10.48550/arXiv.2304.14735","DOIUrl":"https://doi.org/10.48550/arXiv.2304.14735","url":null,"abstract":"Price forecasting for used construction equipment is a challenging task due to spatial and temporal price fluctuations. It is thus of high interest to automate the forecasting process based on current market data. Even though applying machine learning (ML) to these data represents a promising approach to predict the residual value of certain tools, it is hard to implement for small and medium-sized enterprises due to their insufficient ML expertise. To this end, we demonstrate the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions, which automatically generate the underlying pipelines. We combine AutoML methods with the domain knowledge of the companies. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. To take all complex industrial requirements into account and to demonstrate the applicability of our new approach, we designed a novel metric named method evaluation score, which incorporates the most important technical and non-technical metrics for quality and usability. Based on this metric, we show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts for innovative small and medium-sized enterprises which are interested in conducting such solutions.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"30-39"},"PeriodicalIF":2.6,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42864663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-12DOI: 10.48550/arXiv.2303.06720
Maha Asiri, M. Eltabakh
Imperfect databases are very common in many applications due to various reasons ranging from data-entry errors, transmission or integration errors, and wrong instruments' readings, to faulty experimental setups leading to incorrect results. The management and query processing of imperfect databases is a very challenging problem as it requires incorporating the data's qualities within the database engine. Even more challenging, the qualities are typically not static and may evolve over time. Unfortunately, most of the state-of-art techniques deal with the data quality problem as an offline task that is in total isolation of the query processing engine (carried out outside the DBMS). Hence, end-users will receive the queries' results with no clue on whether or not the results can be trusted for further analysis and decision making. In this paper, we propose the it"QTrail-DB"system that fundamentally extends the standard DBMSs to support imperfect databases with evolving qualities. QTrail-DB introduces a new quality model based on the new concept of"Quality Trails", which captures the evolution of the data's qualities over time. QTrail-DB extends the relational data model to incorporate the quality trails within the database system. We propose a new query algebra, called"QTrail Algebra", that enables seamless and transparent propagation and derivations of the data's qualities within a query pipeline. As a result, a query's answer will be automatically annotated with quality-related information at the tuple level. QTrail-DB propagation model leverages the thoroughly-studied propagation semantics present in the DB provenance and lineage tracking literature, and thus there is no need for developing a new query optimizer. QTrail-DB is developed within PostgreSQL and experimentally evaluated using real-world datasets to demonstrate its efficiency and practicality.
{"title":"QTrail-DB: A Query Processing Engine for Imperfect Databases with Evolving Qualities","authors":"Maha Asiri, M. Eltabakh","doi":"10.48550/arXiv.2303.06720","DOIUrl":"https://doi.org/10.48550/arXiv.2303.06720","url":null,"abstract":"Imperfect databases are very common in many applications due to various reasons ranging from data-entry errors, transmission or integration errors, and wrong instruments' readings, to faulty experimental setups leading to incorrect results. The management and query processing of imperfect databases is a very challenging problem as it requires incorporating the data's qualities within the database engine. Even more challenging, the qualities are typically not static and may evolve over time. Unfortunately, most of the state-of-art techniques deal with the data quality problem as an offline task that is in total isolation of the query processing engine (carried out outside the DBMS). Hence, end-users will receive the queries' results with no clue on whether or not the results can be trusted for further analysis and decision making. In this paper, we propose the it\"QTrail-DB\"system that fundamentally extends the standard DBMSs to support imperfect databases with evolving qualities. QTrail-DB introduces a new quality model based on the new concept of\"Quality Trails\", which captures the evolution of the data's qualities over time. QTrail-DB extends the relational data model to incorporate the quality trails within the database system. We propose a new query algebra, called\"QTrail Algebra\", that enables seamless and transparent propagation and derivations of the data's qualities within a query pipeline. As a result, a query's answer will be automatically annotated with quality-related information at the tuple level. QTrail-DB propagation model leverages the thoroughly-studied propagation semantics present in the DB provenance and lineage tracking literature, and thus there is no need for developing a new query optimizer. QTrail-DB is developed within PostgreSQL and experimentally evaluated using real-world datasets to demonstrate its efficiency and practicality.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"295-302"},"PeriodicalIF":2.6,"publicationDate":"2023-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49395702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intan Nurma Yulita, Victor Wijaya, Rudi Rosadi, Indra Sarathan, Yusa Djuyandi, Anton Satria Prabuwono
To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been extensively shared on social media, including YouTube. However, there has not been any computerized system developed to date that can assess people’s social media reactions. Therefore, this paper provides a sentiment analysis application to this government policy by employing a bidirectional encoder representation from transformers (BERT) approach. The study method began with data collecting, data labeling, data preprocessing, BERT model training, and model evaluation. This study created a new dataset for this topic. The data were collected from the comments section of YouTube, and were categorized into three categories: positive, neutral, and negative. This research yielded an F-score of 84.33%. Another contribution from this study regards the methodology for processing sentiment analysis in Indonesian. In addition, the model was created as an application using the Python programming language and the Flask framework. The government can learn the extent to which the public accepts the policies that have been implemented by utilizing this research.
{"title":"Analysis of Government Policy Sentiment Regarding Vacation during the COVID-19 Pandemic Using the Bidirectional Encoder Representation from Transformers (BERT)","authors":"Intan Nurma Yulita, Victor Wijaya, Rudi Rosadi, Indra Sarathan, Yusa Djuyandi, Anton Satria Prabuwono","doi":"10.3390/data8030046","DOIUrl":"https://doi.org/10.3390/data8030046","url":null,"abstract":"To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been extensively shared on social media, including YouTube. However, there has not been any computerized system developed to date that can assess people’s social media reactions. Therefore, this paper provides a sentiment analysis application to this government policy by employing a bidirectional encoder representation from transformers (BERT) approach. The study method began with data collecting, data labeling, data preprocessing, BERT model training, and model evaluation. This study created a new dataset for this topic. The data were collected from the comments section of YouTube, and were categorized into three categories: positive, neutral, and negative. This research yielded an F-score of 84.33%. Another contribution from this study regards the methodology for processing sentiment analysis in Indonesian. In addition, the model was created as an application using the Python programming language and the Flask framework. The government can learn the extent to which the public accepts the policies that have been implemented by utilizing this research.","PeriodicalId":36824,"journal":{"name":"Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136173540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-19DOI: 10.48550/arXiv.2212.09376
Alessandro Temperoni, M. Biryukov, M. Theobald
Relation extraction (RE) is a sub-discipline of information extraction (IE) which focuses on the prediction of a relational predicate from a natural-language input unit (such as a sentence, a clause, or even a short paragraph consisting of multiple sentences and/or clauses). Together with named-entity recognition (NER) and disambiguation (NED), RE forms the basis for many advanced IE tasks such as knowledge-base (KB) population and verification. In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE by encoding structured information about the sentences' principal units, such as subjects, objects, verbal phrases, and adverbials, into various forms of vectorized (and hence unstructured) representations of the sentences. Our main conjecture is that the decomposition of long and possibly convoluted sentences into multiple smaller clauses via OpenIE even helps to fine-tune context-sensitive language models such as BERT (and its plethora of variants) for RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models compared to existing RE approaches. Our best results reach 92% and 71% of F1 score for KnowledgeNet and FewRel, respectively, proving the effectiveness of our approach on competitive benchmarks.
{"title":"Enriching Relation Extraction with OpenIE","authors":"Alessandro Temperoni, M. Biryukov, M. Theobald","doi":"10.48550/arXiv.2212.09376","DOIUrl":"https://doi.org/10.48550/arXiv.2212.09376","url":null,"abstract":"Relation extraction (RE) is a sub-discipline of information extraction (IE) which focuses on the prediction of a relational predicate from a natural-language input unit (such as a sentence, a clause, or even a short paragraph consisting of multiple sentences and/or clauses). Together with named-entity recognition (NER) and disambiguation (NED), RE forms the basis for many advanced IE tasks such as knowledge-base (KB) population and verification. In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE by encoding structured information about the sentences' principal units, such as subjects, objects, verbal phrases, and adverbials, into various forms of vectorized (and hence unstructured) representations of the sentences. Our main conjecture is that the decomposition of long and possibly convoluted sentences into multiple smaller clauses via OpenIE even helps to fine-tune context-sensitive language models such as BERT (and its plethora of variants) for RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models compared to existing RE approaches. Our best results reach 92% and 71% of F1 score for KnowledgeNet and FewRel, respectively, proving the effectiveness of our approach on competitive benchmarks.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"359-366"},"PeriodicalIF":2.6,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47593248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-05DOI: 10.5220/0011265900003269
Sumana Biswas, Karen Young, J. Griffith
Labelling a large quantity of social media data for the task of supervised machine learning is not only time-consuming but also difficult and expensive. On the other hand, the accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train, and automatic sentiment labelling techniques could reduce the time and cost of human labelling. We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn to assign sentiments to tweets without any human assistance. We compare three scenarios: one uses training and testing datasets with existing ground truth labels; the second experiment uses automatic labels as training and testing datasets; and the third experiment uses three automatic labelling techniques to label the training dataset and uses the ground truth labels for testing. The experiments were evaluated on two Twitter datasets: SemEval-2013 (DS-1) and SemEval-2016 (DS-2). Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model. These findings imply that automatic text labelling could provide significant benefits, and suggest a feasible alternative to the time and cost of human labelling efforts.
{"title":"A Comparison of Automatic Labelling Approaches for Sentiment Analysis","authors":"Sumana Biswas, Karen Young, J. Griffith","doi":"10.5220/0011265900003269","DOIUrl":"https://doi.org/10.5220/0011265900003269","url":null,"abstract":"Labelling a large quantity of social media data for the task of supervised machine learning is not only time-consuming but also difficult and expensive. On the other hand, the accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train, and automatic sentiment labelling techniques could reduce the time and cost of human labelling. We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn to assign sentiments to tweets without any human assistance. We compare three scenarios: one uses training and testing datasets with existing ground truth labels; the second experiment uses automatic labels as training and testing datasets; and the third experiment uses three automatic labelling techniques to label the training dataset and uses the ground truth labels for testing. The experiments were evaluated on two Twitter datasets: SemEval-2013 (DS-1) and SemEval-2016 (DS-2). Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model. These findings imply that automatic text labelling could provide significant benefits, and suggest a feasible alternative to the time and cost of human labelling efforts.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"312-319"},"PeriodicalIF":2.6,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43861695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01Epub Date: 2022-10-30DOI: 10.3390/data7110148
Zoe Ezzes, Sarah M Schneck, Marianne Casilio, Davida Fromm, Antje Mefford, Michael R de Riesthal, Stephen M Wilson
Purpose: Auditory-perceptual rating of connected speech in aphasia (APROCSA) involves trained listeners rating a large number of perceptual features of speech samples, and has shown promise as an approach for quantifying expressive speech and language function in individuals with aphasia. The aim of this study was to obtain consensus ratings for a diverse set of speech samples, which can then be used as training materials for learning the APROCSA system.
Method: Connected speech samples were recorded from six individuals with chronic post-stroke aphasia. A segment containing the first five minutes of participant speech was excerpted from each sample, and 27 features were rated on a five-point scale by five researchers. The researchers then discussed each feature in turn to obtain consensus ratings.
Results: Six connected speech samples are made freely available for research, education, and clinical uses. Consensus ratings are reported for each of the 27 features, for each speech sample. Discrepancies between raters were resolved through discussion, yielding consensus ratings that can be expected to be more accurate than mean ratings.
Conclusions: The dataset will provide a useful resource for scientists, students, and clinicians to learn how to evaluate aphasic speech samples with an auditory-perceptual approach.
{"title":"An open dataset of connected speech in aphasia with consensus ratings of auditory-perceptual features.","authors":"Zoe Ezzes, Sarah M Schneck, Marianne Casilio, Davida Fromm, Antje Mefford, Michael R de Riesthal, Stephen M Wilson","doi":"10.3390/data7110148","DOIUrl":"https://doi.org/10.3390/data7110148","url":null,"abstract":"<p><strong>Purpose: </strong>Auditory-perceptual rating of connected speech in aphasia (APROCSA) involves trained listeners rating a large number of perceptual features of speech samples, and has shown promise as an approach for quantifying expressive speech and language function in individuals with aphasia. The aim of this study was to obtain consensus ratings for a diverse set of speech samples, which can then be used as training materials for learning the APROCSA system.</p><p><strong>Method: </strong>Connected speech samples were recorded from six individuals with chronic post-stroke aphasia. A segment containing the first five minutes of participant speech was excerpted from each sample, and 27 features were rated on a five-point scale by five researchers. The researchers then discussed each feature in turn to obtain consensus ratings.</p><p><strong>Results: </strong>Six connected speech samples are made freely available for research, education, and clinical uses. Consensus ratings are reported for each of the 27 features, for each speech sample. Discrepancies between raters were resolved through discussion, yielding consensus ratings that can be expected to be more accurate than mean ratings.</p><p><strong>Conclusions: </strong>The dataset will provide a useful resource for scientists, students, and clinicians to learn how to evaluate aphasic speech samples with an auditory-perceptual approach.</p>","PeriodicalId":36824,"journal":{"name":"Data","volume":"7 11","pages":""},"PeriodicalIF":2.6,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10617630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71427627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-26DOI: 10.48550/arXiv.2207.12764
A. F. Ghahfarokhi, Fatemeh Akoochekian, F. Zandkarimi, Wil M.P. van der Aalst
Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event logs with a single case notion) to reduce the complexity of process models and discover homogeneous subsets of cases. Nevertheless, in real-life processes, particularly in the context of Business-to-Business (B2B) processes, multiple objects are involved in a process. Recently, Object-Centric Event Logs (OCELs) have been introduced to capture the information of such processes, and several process discovery techniques have been developed on top of OCELs. Yet, the output of the proposed discovery techniques on real OCELs leads to more informative but also more complex models. In this paper, we propose a clustering-based approach to cluster similar objects in OCELs to simplify the obtained process models. Using a case study of a real B2B process, we demonstrate that our approach reduces the complexity of the process models and generates coherent subsets of objects which help the end-users gain insights into the process.
{"title":"Clustering Object-Centric Event Logs","authors":"A. F. Ghahfarokhi, Fatemeh Akoochekian, F. Zandkarimi, Wil M.P. van der Aalst","doi":"10.48550/arXiv.2207.12764","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12764","url":null,"abstract":"Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event logs with a single case notion) to reduce the complexity of process models and discover homogeneous subsets of cases. Nevertheless, in real-life processes, particularly in the context of Business-to-Business (B2B) processes, multiple objects are involved in a process. Recently, Object-Centric Event Logs (OCELs) have been introduced to capture the information of such processes, and several process discovery techniques have been developed on top of OCELs. Yet, the output of the proposed discovery techniques on real OCELs leads to more informative but also more complex models. In this paper, we propose a clustering-based approach to cluster similar objects in OCELs to simplify the obtained process models. Using a case study of a real B2B process, we demonstrate that our approach reduces the complexity of the process models and generates coherent subsets of objects which help the end-users gain insights into the process.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"444-451"},"PeriodicalIF":2.6,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46761795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-17DOI: 10.5220/0011301200003269
Sasindu Wijeratne, Ta-Yang Wang, R. Kannan, V. Prasanna
: Tensor decomposition has become an essential tool in many data science applications. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the pivotal kernel in tensor decomposition algorithms that decompose higher-order real-world large tensors into multiple matrices. Accelerating MTTKRP can speed up the tensor decomposition process immensely. Sparse MTTKRP is a challenging kernel to accelerate due to its irregular memory access characteristics. Implementing accelerators on Field Programmable Gate Array (FPGA) for kernels such as MTTKRP is attractive due to the energy efficiency and the inherent parallelism of FPGA. This paper explores the opportunities, key challenges, and an approach for designing a custom memory controller on FPGA for MTTKRP while exploring the parameter space of such a custom memory controller.
{"title":"Towards Programmable Memory Controller for Tensor Decomposition","authors":"Sasindu Wijeratne, Ta-Yang Wang, R. Kannan, V. Prasanna","doi":"10.5220/0011301200003269","DOIUrl":"https://doi.org/10.5220/0011301200003269","url":null,"abstract":": Tensor decomposition has become an essential tool in many data science applications. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the pivotal kernel in tensor decomposition algorithms that decompose higher-order real-world large tensors into multiple matrices. Accelerating MTTKRP can speed up the tensor decomposition process immensely. Sparse MTTKRP is a challenging kernel to accelerate due to its irregular memory access characteristics. Implementing accelerators on Field Programmable Gate Array (FPGA) for kernels such as MTTKRP is attractive due to the energy efficiency and the inherent parallelism of FPGA. This paper explores the opportunities, key challenges, and an approach for designing a custom memory controller on FPGA for MTTKRP while exploring the parameter space of such a custom memory controller.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"468-475"},"PeriodicalIF":2.6,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48646267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}