Data Technologies and Applications最新文献

Understanding customer behavior by mapping complaints to personality based on social media textual data 通过基于社交媒体文本数据的投诉与个性映射，了解客户行为

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-09-09 DOI: 10.1108/dta-02-2024-0162

Andry Alamsyah, Fadiah Nadhila, Nabila Kalvina Izumi

Purpose

Technology serves as a key catalyst in shaping society and the economy, significantly altering customer dynamics. Through a deep understanding of these evolving behaviors, a service can be tailored to address each customer's unique needs and personality. We introduce a strategy to integrate customer complaints with their personality traits, enabling responses that resonate with the customer’s unique personality.

Design/methodology/approach

We propose a strategy to incorporate customer complaints with their personality traits, enabling responses that reflect the customer’s unique personality. Our approach is twofold: firstly, we employ the customer complaints ontology (CCOntology) framework enforced with multi-class classification based on a machine learning algorithm, to classify complaints. Secondly, we leverage the personality measurement platform (PMP), powered by the big five personality model to predict customer’s personalities. We develop the framework for the Indonesian language by extracting tweets containing customer complaints directed towards Indonesia's three biggest e-commerce services.

Findings

By mapping customer complaints and their personality type, we can identify specific personality traits associated with customer dissatisfaction. Thus, personalizing how we offer the solution based on specific characteristics.

Originality/value

The research enriches the state-of-the-art personalizing service research based on captured customer behavior. Thus, our research fills the research gap in considering customer personalities. We provide comprehensive insights by aligning customer feedback with corresponding personality traits extracted from social media data. The result is a highly customized response mechanism attuned to individual customer preferences and requirements.

目的技术是塑造社会和经济的关键催化剂，极大地改变了客户的动态。通过深入了解这些不断变化的行为，可以针对每位客户的独特需求和个性量身定制服务。我们提出了一种将客户投诉与其个性特征相结合的策略，从而能够做出反映客户独特个性的回应。我们的方法有两个方面：首先，我们采用客户投诉本体（CCOntology）框架，并基于机器学习算法进行多类分类，对投诉进行分类。其次，我们利用人格测量平台（PMP），通过五大人格模型来预测客户的人格。通过提取包含针对印尼三大电子商务服务的客户投诉的推文，我们为印尼语开发了这一框架。原创性/价值这项研究丰富了基于客户行为捕捉的最先进的个性化服务研究。因此，我们的研究填补了在考虑客户个性方面的研究空白。我们通过将客户反馈与从社交媒体数据中提取的相应个性特征相结合，提供了全面的见解。其结果是建立了一个高度定制化的响应机制，以适应客户的个人偏好和要求。

{"title":"Understanding customer behavior by mapping complaints to personality based on social media textual data","authors":"Andry Alamsyah, Fadiah Nadhila, Nabila Kalvina Izumi","doi":"10.1108/dta-02-2024-0162","DOIUrl":"https://doi.org/10.1108/dta-02-2024-0162","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>Technology serves as a key catalyst in shaping society and the economy, significantly altering customer dynamics. Through a deep understanding of these evolving behaviors, a service can be tailored to address each customer's unique needs and personality. We introduce a strategy to integrate customer complaints with their personality traits, enabling responses that resonate with the customer’s unique personality.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>We propose a strategy to incorporate customer complaints with their personality traits, enabling responses that reflect the customer’s unique personality. Our approach is twofold: firstly, we employ the customer complaints ontology (CCOntology) framework enforced with multi-class classification based on a machine learning algorithm, to classify complaints. Secondly, we leverage the personality measurement platform (PMP), powered by the big five personality model to predict customer’s personalities. We develop the framework for the Indonesian language by extracting tweets containing customer complaints directed towards Indonesia's three biggest e-commerce services.</p>\u0000<h3>Findings</h3>\u0000<p>By mapping customer complaints and their personality type, we can identify specific personality traits associated with customer dissatisfaction. Thus, personalizing how we offer the solution based on specific characteristics.</p>\u0000<h3>Originality/value</h3>\u0000<p>The research enriches the state-of-the-art personalizing service research based on captured customer behavior. Thus, our research fills the research gap in considering customer personalities. We provide comprehensive insights by aligning customer feedback with corresponding personality traits extracted from social media data. The result is a highly customized response mechanism attuned to individual customer preferences and requirements.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"23 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic review of the use of FHIR to support clinical research, public health and medical education 关于使用 FHIR 支持临床研究、公共卫生和医学教育的系统综述

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-09-03 DOI: 10.1108/dta-11-2023-0804

João Pavão, Rute Bastardo, Nelson Pacheco Rocha

Purpose

This systematic review aimed to identify and categorize applications using Fast Healthcare Interoperability Resources (FHIR) to support activities outside of direct healthcare provision.

Design/methodology/approach

A systematic electronic search was performed, and 53 studies were included after the selection process.

Findings

The results show that FHIR is being used to support (1) clinical research (i.e. clinical research based on interventional trials, data interoperability to support clinical research and advanced communication services to support clinical research), (2) public health and (3) medical education. Despite the FHIR potential to support activities outside of direct healthcare provision, some barriers were identified, namely difficulties translating the proposed applications to clinical environments or FHIR technical issues that require further developments.

Originality/value

This study provided a broad review of how FHIR is being applied in clinical activities outside of direct clinical care and identified three major domains, that is, clinical research, public health and medical education, being the first and most representative in terms of number of publications.

目的本系统综述旨在确定使用快速医疗互操作性资源（FHIR）支持直接医疗服务以外活动的应用并对其进行分类。结果结果显示，FHIR正被用于支持（1）临床研究（即基于介入试验的临床研究、支持临床研究的数据互操作性和支持临床研究的高级通信服务）、（2）公共卫生和（3）医学教育。尽管 FHIR 有潜力支持直接医疗服务之外的活动，但也发现了一些障碍，即难以将提议的应用转化为临床环境，或 FHIR 技术问题需要进一步开发。这项研究对 FHIR 如何应用于直接临床护理之外的临床活动进行了广泛评述，并确定了三个主要领域，即临床研究、公共卫生和医学教育。

{"title":"A systematic review of the use of FHIR to support clinical research, public health and medical education","authors":"João Pavão, Rute Bastardo, Nelson Pacheco Rocha","doi":"10.1108/dta-11-2023-0804","DOIUrl":"https://doi.org/10.1108/dta-11-2023-0804","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>This systematic review aimed to identify and categorize applications using Fast Healthcare Interoperability Resources (FHIR) to support activities outside of direct healthcare provision.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>A systematic electronic search was performed, and 53 studies were included after the selection process.</p>\u0000<h3>Findings</h3>\u0000<p>The results show that FHIR is being used to support (1) clinical research (i.e. clinical research based on interventional trials, data interoperability to support clinical research and advanced communication services to support clinical research), (2) public health and (3) medical education. Despite the FHIR potential to support activities outside of direct healthcare provision, some barriers were identified, namely difficulties translating the proposed applications to clinical environments or FHIR technical issues that require further developments.</p>\u0000<h3>Originality/value</h3>\u0000<p>This study provided a broad review of how FHIR is being applied in clinical activities outside of direct clinical care and identified three major domains, that is, clinical research, public health and medical education, being the first and most representative in terms of number of publications.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"70 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel framework for learning performance prediction using pattern identification and deep learning 利用模式识别和深度学习预测学习成绩的新框架

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-08-21 DOI: 10.1108/dta-09-2023-0539

Cheng-Hsiung Weng, Cheng-Kui Huang

Purpose

Educational data mining (EDM) discovers significant patterns from educational data and thus can help understand the relations between learners and their educational settings. However, most previous data mining techniques focus on prediction of learning performance of learners without integrating learning patterns identification techniques.

Design/methodology/approach

This study proposes a new framework for identifying learning patterns and predicting learning performance. Two modules, the learning patterns identification module and the deep learning prediction models (DNN), are integrated into this framework to identify the difference of learning performance and predicting learning performance from profiles of students.

Findings

Experimental results from survey data indicate that the proposed identifying learning patterns module could facilitate identifying valuable difference (change) patterns from student’s profiles. The proposed learning performance prediction module which adapts DNN also performs better than traditional machine techniques in prediction performance metrics.

Originality/value

To our best knowledge, the framework is the only educational system in the literature for identifying learning patterns and predicting learning performance.

目的教育数据挖掘（EDM）能从教育数据中发现重要模式，从而帮助理解学习者与其教育环境之间的关系。然而，以往的数据挖掘技术大多侧重于预测学习者的学习成绩，而没有整合学习模式识别技术。研究结果通过调查数据得出的实验结果表明，所提出的学习模式识别模块有助于从学生的档案中识别出有价值的差异（变化）模式。据我们所知，该框架是文献中唯一用于识别学习模式和预测学习成绩的教育系统。

{"title":"Novel framework for learning performance prediction using pattern identification and deep learning","authors":"Cheng-Hsiung Weng, Cheng-Kui Huang","doi":"10.1108/dta-09-2023-0539","DOIUrl":"https://doi.org/10.1108/dta-09-2023-0539","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>Educational data mining (EDM) discovers significant patterns from educational data and thus can help understand the relations between learners and their educational settings. However, most previous data mining techniques focus on prediction of learning performance of learners without integrating learning patterns identification techniques.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>This study proposes a new framework for identifying learning patterns and predicting learning performance. Two modules, the learning patterns identification module and the deep learning prediction models (DNN), are integrated into this framework to identify the difference of learning performance and predicting learning performance from profiles of students.</p>\u0000<h3>Findings</h3>\u0000<p>Experimental results from survey data indicate that the proposed identifying learning patterns module could facilitate identifying valuable difference (change) patterns from student’s profiles. The proposed learning performance prediction module which adapts DNN also performs better than traditional machine techniques in prediction performance metrics.</p>\u0000<h3>Originality/value</h3>\u0000<p>To our best knowledge, the framework is the only educational system in the literature for identifying learning patterns and predicting learning performance.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"5 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach 利用机器学习对工作满意度预测模型进行比较分析：一种混合方法

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-08-14 DOI: 10.1108/dta-10-2023-0697

Jaekyeong Kim, Pil-Sik Chang, Sung-Byung Yang, Ilyoung Choi, Byunghyun Lee

Purpose

Because the food service industry is more dependent on customer contact and human resources than other industries, it is crucial to understand the factors influencing employee job satisfaction to ensure that employees provide satisfactory service to customers. However, few studies have incorporated employee reviews of job portals into their research. Many job seekers tend to trust company reviews posted by employees on job portals based on the information provided by the company itself. Thus, this study utilized company reviews and job satisfaction ratings from employees in the food service industry on a job portal site, Job Planet, to conduct mixed-method research.

Design/methodology/approach

For qualitative research, we applied the Latent Dirichlet Allocation (LDA) model to food service industry company reviews to identify 10 job satisfaction factors considered important by employees. For quantitative research, four algorithms were used to predict job satisfaction ratings: regression tree, multilayer perceptron (MLP), random forest and XGBoost. Thus, we generated predictor variables for six cases using the probability values of topics and job satisfaction ratings on a five-point scale through LDA and used them to build prediction algorithms.

Findings

The analysis showed that algorithm accuracy performed differently in each of the six cases, and overall, factors such as work-life balance and work environment have a significant impact on predicting job satisfaction ratings.

Originality/value

This study is significant because its methodology and results suggest a new approach based on data analysis in the field of human resources, which can contribute to the operation and planning of corporate human resources management in the future.

目的与其他行业相比，餐饮行业更依赖于与客户的接触和人力资源，因此了解影响员工工作满意度的因素对于确保员工为客户提供满意的服务至关重要。然而，很少有研究将员工对招聘门户网站的评价纳入研究范围。许多求职者倾向于根据公司本身提供的信息来相信员工在招聘门户网站上发布的公司评论。因此，本研究利用就业门户网站 Job Planet 上餐饮服务行业员工的公司评论和工作满意度评分，开展了混合方法研究。在定性研究中，我们对餐饮服务行业的公司评论采用了潜在德里赫利分配（LDA）模型，以确定员工认为重要的 10 个工作满意度因素。在定量研究中，我们使用了四种算法来预测工作满意度评级：回归树、多层感知器（MLP）、随机森林和 XGBoost。结果分析表明，算法的准确性在六个案例中的表现各不相同，总体而言，工作与生活的平衡和工作环境等因素对预测工作满意度有显著影响。原创性/价值本研究的意义在于其方法和结果为人力资源领域提出了一种基于数据分析的新方法，有助于未来企业人力资源管理的运作和规划。

{"title":"A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach","authors":"Jaekyeong Kim, Pil-Sik Chang, Sung-Byung Yang, Ilyoung Choi, Byunghyun Lee","doi":"10.1108/dta-10-2023-0697","DOIUrl":"https://doi.org/10.1108/dta-10-2023-0697","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>Because the food service industry is more dependent on customer contact and human resources than other industries, it is crucial to understand the factors influencing employee job satisfaction to ensure that employees provide satisfactory service to customers. However, few studies have incorporated employee reviews of job portals into their research. Many job seekers tend to trust company reviews posted by employees on job portals based on the information provided by the company itself. Thus, this study utilized company reviews and job satisfaction ratings from employees in the food service industry on a job portal site, Job Planet, to conduct mixed-method research.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>For qualitative research, we applied the Latent Dirichlet Allocation (LDA) model to food service industry company reviews to identify 10 job satisfaction factors considered important by employees. For quantitative research, four algorithms were used to predict job satisfaction ratings: regression tree, multilayer perceptron (MLP), random forest and XGBoost. Thus, we generated predictor variables for six cases using the probability values of topics and job satisfaction ratings on a five-point scale through LDA and used them to build prediction algorithms.</p>\u0000<h3>Findings</h3>\u0000<p>The analysis showed that algorithm accuracy performed differently in each of the six cases, and overall, factors such as work-life balance and work environment have a significant impact on predicting job satisfaction ratings.</p>\u0000<h3>Originality/value</h3>\u0000<p>This study is significant because its methodology and results suggest a new approach based on data analysis in the field of human resources, which can contribute to the operation and planning of corporate human resources management in the future.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"5 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis 评估企业环境、社会和治理信息披露与联合国可持续发展目标的一致性：基于 BERT 的文本分析

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-08-14 DOI: 10.1108/dta-01-2024-0065

Hyogon Kim, Eunmi Lee, Donghee Yoo

Purpose

This study aims to provide measurable information that evaluates a company’s ESG performance based on the conceptual connection between ESG, non-financial elements of a company and the UN Sustainable Development Goals (SDGs) for resolving global issues.

Design/methodology/approach

A novel data processing method based on the BERT is presented and applied to analyze the changes and characteristics of SDG-related ESG texts from companies’ disclosures over the past decade. Specifically, ESG-related sentences are extracted from 93,277 Form 10-K filings disclosed between 2010 and 2022 and the similarity between these extracted sentences and SDGs statements is calculated through sentence transformers. A classifier is created by fine-tuning FinBERT, a financial domain-specific pre-trained language model, to classify the sentences into eight ESG classes.

Findings

The quantified results obtained from the classifier reveal several implications. First, it is observed that the trend of SDG-related ESG sentences shows a slow and steady increase over the past decade. Second, large-cap companies relatively have a greater amount of SDG-related ESG disclosures than small-cap companies. Third, significant events such as the COVID-19 pandemic greatly impact the changes in disclosure content.

Originality/value

This study presents a novel approach to textual analysis using neural network-based language models such as BERT. The results of this study provide meaningful information and insights for investors in socially responsible investment and sustainable investment and suggest that corporations need a long-term plan regarding ESG disclosures.

本研究旨在提供可衡量的信息，根据公司的 ESG、非财务要素与联合国可持续发展目标（SDGs）之间的概念联系来评估公司的 ESG 表现，以解决全球问题。本研究提出了一种基于 BERT 的新型数据处理方法，并将其用于分析过去十年间公司披露的与 SDG 相关的 ESG 文本的变化和特征。具体而言，从 2010 年至 2022 年披露的 93,277 份 Form 10-K 文件中提取了与 ESG 相关的句子，并通过句子转换器计算了这些提取句子与 SDGs 语句之间的相似度。通过微调特定金融领域的预训练语言模型 FinBERT，创建了一个分类器，将句子分为八个 ESG 类别。首先，在过去十年中，SDG 相关 ESG 句子呈现出缓慢而稳定的增长趋势。其次，大市值公司与 SDG 相关的 ESG 披露相对多于小市值公司。第三，COVID-19 大流行等重大事件对披露内容的变化产生了很大影响。研究结果为社会责任投资和可持续投资领域的投资者提供了有意义的信息和见解，并建议企业需要制定有关 ESG 披露的长期计划。

{"title":"Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis","authors":"Hyogon Kim, Eunmi Lee, Donghee Yoo","doi":"10.1108/dta-01-2024-0065","DOIUrl":"https://doi.org/10.1108/dta-01-2024-0065","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>This study aims to provide measurable information that evaluates a company’s ESG performance based on the conceptual connection between ESG, non-financial elements of a company and the UN Sustainable Development Goals (SDGs) for resolving global issues.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>A novel data processing method based on the BERT is presented and applied to analyze the changes and characteristics of SDG-related ESG texts from companies’ disclosures over the past decade. Specifically, ESG-related sentences are extracted from 93,277 Form 10-K filings disclosed between 2010 and 2022 and the similarity between these extracted sentences and SDGs statements is calculated through sentence transformers. A classifier is created by fine-tuning FinBERT, a financial domain-specific pre-trained language model, to classify the sentences into eight ESG classes.</p>\u0000<h3>Findings</h3>\u0000<p>The quantified results obtained from the classifier reveal several implications. First, it is observed that the trend of SDG-related ESG sentences shows a slow and steady increase over the past decade. Second, large-cap companies relatively have a greater amount of SDG-related ESG disclosures than small-cap companies. Third, significant events such as the COVID-19 pandemic greatly impact the changes in disclosure content.</p>\u0000<h3>Originality/value</h3>\u0000<p>This study presents a novel approach to textual analysis using neural network-based language models such as BERT. The results of this study provide meaningful information and insights for investors in socially responsible investment and sustainable investment and suggest that corporations need a long-term plan regarding ESG disclosures.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"16 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of CEO career patterns using machine learning: taking US university graduates as an example 利用机器学习分析首席执行官的职业模式：以美国大学毕业生为例

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-08-02 DOI: 10.1108/dta-04-2023-0132

Chia Yu Hung, Eddie Jeng, Li Chen Cheng

Purpose

This study explores the career trajectories of Chief Executive Officers (CEOs) to uncover unique characteristics that contribute to their success. By utilizing web scraping and machine learning techniques, over two thousand CEO profiles from LinkedIn are analyzed to understand patterns in their career paths. This study offers an alternative approach compared to the predominantly qualitative research methods employed in previous research.

Design/methodology/approach

This study proposes a framework for analyzing CEO career patterns. Job titles and company information are encoded using the Standard Occupational Classification (SOC) scheme. The study employs the Needleman-Wunsch optimal matching algorithm and an agglomerative approach to construct distance matrices and cluster CEO career paths.

Findings

This study gathered data on the career transition processes of graduates from several renowned public and private universities in the United States via LinkedIn. Employing machine learning techniques, the analysis revealed diverse career trajectories. The findings offer career guidance for individuals from various academic backgrounds aspiring to become CEOs.

Research limitations/implications

The building of a career sequence that takes into account the number of years requires integers. Numbers that are not integers have been rounded up to facilitate the optimal matching process but this approach prevents a perfectly accurate representation of time worked.

Practical implications

This study makes an original contribution to the field of career pattern analysis by disclosing the distinct career path groups of CEOs using the rich LinkedIn online dataset. Note that our CEO profiles are not restricted in any industry or specific career paths followed to becoming CEOs. In light of the fact that individuals who hold CEO positions are usually perceived by society as successful, we are interested in finding the characteristics behind their success and whether either the title held or the company they remain at show patterns in making them who they are today.

Originality/value

As a matter of fact, nearly all CEOs had previous experience working for a non-Fortune organization before joining a Fortune company. Of those who have worked for Fortune firms, the number of CEOs with experience in Fortune 500 forms exceeded those with experience in Fortune 1,000 firms.

目的本研究探讨了首席执行官（CEO）的职业轨迹，揭示了有助于他们成功的独特特征。本研究利用网络搜索和机器学习技术，分析了 LinkedIn 上两千多名首席执行官的个人资料，以了解他们的职业道路模式。与以往研究中主要采用的定性研究方法相比，本研究提供了另一种方法。设计/方法/途径本研究提出了一个分析 CEO 职业模式的框架。职称和公司信息采用标准职业分类（SOC）方案进行编码。研究采用Needleman-Wunsch最优匹配算法和聚类方法来构建距离矩阵，并对CEO的职业路径进行聚类。研究结果本研究通过LinkedIn收集了美国几所著名公立和私立大学毕业生的职业转换过程数据。利用机器学习技术，分析揭示了多样化的职业轨迹。研究局限/意义建立一个考虑到年数的职业序列需要整数。为了便于优化匹配过程，非整数的数字被四舍五入，但这种方法无法完全准确地反映工作时间。实际意义本研究利用丰富的 LinkedIn 在线数据集，揭示了 CEO 的不同职业路径群体，为职业模式分析领域做出了原创性贡献。请注意，我们的首席执行官档案并不局限于任何行业或成为首席执行官的特定职业道路。鉴于担任首席执行官职位的人通常被社会视为成功人士，我们有兴趣找到他们成功背后的特征，以及所担任的头衔或所待的公司是否显示出使他们成为今天这样的人的模式。在那些曾在《财富》公司工作过的首席执行官中，曾在《财富》500 强企业工作过的人数超过了曾在《财富》1000 强企业工作过的人数。

{"title":"Analysis of CEO career patterns using machine learning: taking US university graduates as an example","authors":"Chia Yu Hung, Eddie Jeng, Li Chen Cheng","doi":"10.1108/dta-04-2023-0132","DOIUrl":"https://doi.org/10.1108/dta-04-2023-0132","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>This study explores the career trajectories of Chief Executive Officers (CEOs) to uncover unique characteristics that contribute to their success. By utilizing web scraping and machine learning techniques, over two thousand CEO profiles from LinkedIn are analyzed to understand patterns in their career paths. This study offers an alternative approach compared to the predominantly qualitative research methods employed in previous research.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>This study proposes a framework for analyzing CEO career patterns. Job titles and company information are encoded using the Standard Occupational Classification (SOC) scheme. The study employs the Needleman-Wunsch optimal matching algorithm and an agglomerative approach to construct distance matrices and cluster CEO career paths.</p>\u0000<h3>Findings</h3>\u0000<p>This study gathered data on the career transition processes of graduates from several renowned public and private universities in the United States via LinkedIn. Employing machine learning techniques, the analysis revealed diverse career trajectories. The findings offer career guidance for individuals from various academic backgrounds aspiring to become CEOs.</p>\u0000<h3>Research limitations/implications</h3>\u0000<p>The building of a career sequence that takes into account the number of years requires integers. Numbers that are not integers have been rounded up to facilitate the optimal matching process but this approach prevents a perfectly accurate representation of time worked.</p>\u0000<h3>Practical implications</h3>\u0000<p>This study makes an original contribution to the field of career pattern analysis by disclosing the distinct career path groups of CEOs using the rich LinkedIn online dataset. Note that our CEO profiles are not restricted in any industry or specific career paths followed to becoming CEOs. In light of the fact that individuals who hold CEO positions are usually perceived by society as successful, we are interested in finding the characteristics behind their success and whether either the title held or the company they remain at show patterns in making them who they are today.</p>\u0000<h3>Originality/value</h3>\u0000<p>As a matter of fact, nearly all CEOs had previous experience working for a non-Fortune organization before joining a Fortune company. Of those who have worked for Fortune firms, the number of CEOs with experience in Fortune 500 forms exceeded those with experience in Fortune 1,000 firms.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"79 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MD-LDA: a supervised LDA topic model for identifying mechanism of disease in TCM MD-LDA：用于识别中医病机的有监督 LDA 主题模型

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-07-22 DOI: 10.1108/dta-12-2023-0868

Meiwen Li, Liye Xia, Qingtao Wu, Lin Wang, Junlong Zhu, Mingchuan Zhang

Purpose

In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms underlying the occurrence, progression, alterations and outcomes of diseases. However, there is a dearth of research in the field of intelligent diagnosis concerning the analysis of MD.

Design/methodology/approach

In this paper, we propose a supervised Latent Dirichlet Allocation (LDA) topic model, termed MD-LDA, which elucidates the process of MDs identification. We leverage the label information inherent in the data as prior knowledge and incorporate it into the model’s training. Additionally, we devise two parallel parameter estimation algorithms for efficient training. Furthermore, we introduce a benchmark MD identification dataset, named TMD, for training MD-LDA. Finally, we validate the performance of MD-LDA through comprehensive experiments.

Findings

The results show that MD-LDA is effective and efficient. Moreover, MD-LDA outperforms the state-of-the-art topic models on perplexity, Kullback–Leibler (KL) and classification performance.

Originality/value

The proposed MD-LDA can be applied for the MD discovery and analysis of TCM clinical diagnosis, so as to improve the interpretability and reliability of intelligent diagnosis and treatment.

目的在传统中医（TCM）中，病机（MD）是辨证论治的基本要素，它阐明了疾病发生、发展、改变和结局的内在机制。在本文中，我们提出了一种有监督的潜狄利克特分配（LDA）主题模型，称为 MD-LDA，它阐明了 MD 的识别过程。我们将数据中固有的标签信息作为先验知识加以利用，并将其纳入模型的训练中。此外，我们还设计了两种并行参数估计算法，以实现高效训练。此外，我们还引入了名为 TMD 的基准 MD 识别数据集，用于训练 MD-LDA。最后，我们通过综合实验验证了 MD-LDA 的性能。原创性/价值所提出的 MD-LDA 可应用于中医临床诊断的 MD 发现和分析，从而提高智能诊疗的可解释性和可靠性。

{"title":"MD-LDA: a supervised LDA topic model for identifying mechanism of disease in TCM","authors":"Meiwen Li, Liye Xia, Qingtao Wu, Lin Wang, Junlong Zhu, Mingchuan Zhang","doi":"10.1108/dta-12-2023-0868","DOIUrl":"https://doi.org/10.1108/dta-12-2023-0868","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms underlying the occurrence, progression, alterations and outcomes of diseases. However, there is a dearth of research in the field of intelligent diagnosis concerning the analysis of MD.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>In this paper, we propose a supervised Latent Dirichlet Allocation (LDA) topic model, termed MD-LDA, which elucidates the process of MDs identification. We leverage the label information inherent in the data as prior knowledge and incorporate it into the model’s training. Additionally, we devise two parallel parameter estimation algorithms for efficient training. Furthermore, we introduce a benchmark MD identification dataset, named TMD, for training MD-LDA. Finally, we validate the performance of MD-LDA through comprehensive experiments.</p>\u0000<h3>Findings</h3>\u0000<p>The results show that MD-LDA is effective and efficient. Moreover, MD-LDA outperforms the state-of-the-art topic models on perplexity, Kullback–Leibler (KL) and classification performance.</p>\u0000<h3>Originality/value</h3>\u0000<p>The proposed MD-LDA can be applied for the MD discovery and analysis of TCM clinical diagnosis, so as to improve the interpretability and reliability of intelligent diagnosis and treatment.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"36 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141739382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Early identification of high attention content for online mental health community users based on multi-level fusion model 基于多层次融合模型的在线心理健康社区用户高关注度内容的早期识别

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-07-12 DOI: 10.1108/dta-06-2023-0230

Song Wang, Ying Luo, Xinmin Liu

Purpose

The overload of user-generated content in online mental health community makes the focus and resonance tendencies of the participating groups less clear. Thus, the purpose of this paper is to build an early identification mechanism for users' high attention content to promote early intervention and effective dissemination of professional medical guidance.

Design/methodology/approach

We decouple the identification mechanism from two processes: early feature combing and algorithmic model construction. Firstly, based on the differentiated needs and concerns of the participant groups, the multiple features of “information content + source users” are refined. Secondly, a multi-level fusion model is constructed for features processing. Specifically, Bidirectional Encoder Representation from Transformers (BERT)-Bi-directional Long-Short Term Memory (BiLSTM)-Linear are used to refine the semantic features, while Graph Attention Networks (GAT) is used to capture the entity attributes and relation features. Finally, the Convolutional Neural Network (CNN) is used to optimize the multi-level fusion features.

Findings

The results show that the ACC of the multi-level fusion model is 84.42%, F1 is 79.43% and R is 76.71%. Compared with other baseline models and single feature elements, the ACC and F1 values are improved to different degrees.

Originality/value

The originality of this paper lies in analyzing multiple features based on early stages and constructing a new multi-level fusion model for processing. Further, the study is valuable for the orientation of psychological patients' needs and early guidance of professional medical care.

目的网络心理健康社区中用户生成的内容过多，使得参与群体的关注点和共鸣倾向不太清晰。因此，本文旨在建立用户高关注度内容的早期识别机制，以促进专业医疗指导的早期干预和有效传播。首先，基于参与群体的差异化需求和关注点，提炼出 "信息内容+来源用户 "的多重特征。其次，构建多层次融合模型进行特征处理。具体来说，双向变换器编码器表征（BERT）-双向长短期记忆（BiLSTM）-线性用于提炼语义特征，图注意网络（GAT）用于捕捉实体属性和关系特征。结果结果表明，多层次融合模型的 ACC 为 84.42%，F1 为 79.43%，R 为 76.71%。与其他基线模型和单一特征元素相比，ACC 值和 F1 值均有不同程度的提高。原创性/价值本文的原创性在于基于早期阶段分析多个特征，并构建新的多级融合模型进行处理。此外，该研究对心理疾病患者的需求定位和专业医疗的早期指导也很有价值。

{"title":"Early identification of high attention content for online mental health community users based on multi-level fusion model","authors":"Song Wang, Ying Luo, Xinmin Liu","doi":"10.1108/dta-06-2023-0230","DOIUrl":"https://doi.org/10.1108/dta-06-2023-0230","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>The overload of user-generated content in online mental health community makes the focus and resonance tendencies of the participating groups less clear. Thus, the purpose of this paper is to build an early identification mechanism for users' high attention content to promote early intervention and effective dissemination of professional medical guidance.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>We decouple the identification mechanism from two processes: early feature combing and algorithmic model construction. Firstly, based on the differentiated needs and concerns of the participant groups, the multiple features of “information content + source users” are refined. Secondly, a multi-level fusion model is constructed for features processing. Specifically, Bidirectional Encoder Representation from Transformers (BERT)-Bi-directional Long-Short Term Memory (BiLSTM)-Linear are used to refine the semantic features, while Graph Attention Networks (GAT) is used to capture the entity attributes and relation features. Finally, the Convolutional Neural Network (CNN) is used to optimize the multi-level fusion features.</p>\u0000<h3>Findings</h3>\u0000<p>The results show that the ACC of the multi-level fusion model is 84.42%, F1 is 79.43% and R is 76.71%. Compared with other baseline models and single feature elements, the ACC and F1 values are improved to different degrees.</p>\u0000<h3>Originality/value</h3>\u0000<p>The originality of this paper lies in analyzing multiple features based on early stages and constructing a new multi-level fusion model for processing. Further, the study is valuable for the orientation of psychological patients' needs and early guidance of professional medical care.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"4 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tracking the size of the estimation window in time-series data 跟踪时间序列数据中估计窗口的大小

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-06-12 DOI: 10.1108/dta-11-2023-0797

Tae Yeon Kwon

Purpose

This paper introduces a novel method, Variance Rule-based Window Size Tracking (VR-WT), for deriving a sequence of estimation window sizes. This approach not only identifies structural change points but also ascertains the optimal size of the estimation window. VR-WT is designed to achieve accurate model estimation and is versatile enough to be applied across a range of models in various disciplines.

Design/methodology/approach

This paper proposes a new method named Variance Rule-based Window size Tracking (VR-WT), which derives a sequence of estimation window sizes. The concept of VR-WT is inspired by the Potential Scale Reduction Factor (PSRF), a tool used to evaluate the convergence and stationarity of MCMC.

Findings

Monte Carlo simulation study demonstrates that VR-WT accurately detects structural change points and select appropriate window sizes. The VR-WT is essential in applications where accurate estimation of model parameters and inference about their value, sign, and significance are critical. The VR-WT has also helped us understand shifts in parameter-based inference, ensuring stability across periods and highlighting how the timing and impact of market shocks vary across fields and datasets.

Originality/value

The first distinction of the VR-WT lies in its purpose and methodological differences. The VR-WT focuses on precise parameter estimation. By dynamically tracking window sizes, VR-WT selects flexible window sizes and enables the visualization of structural changes. The second distinction of VR-WT lies in its broad applicability and versatility. We conducted empirical applications across three fields of study: CAPM; interdependence analysis between global stock markets; and the study of time-dependent energy prices.

本文介绍了一种新方法--基于方差规则的窗口大小跟踪（VR-WT），用于推导估算窗口大小序列。这种方法不仅能识别结构变化点，还能确定估计窗口的最佳大小。VR-WT 旨在实现精确的模型估算，其通用性足以适用于各学科的一系列模型。设计/方法/途径本文提出了一种名为 "基于方差规则的窗口尺寸跟踪"（VR-WT）的新方法，该方法可得出一系列估算窗口尺寸。研究结果蒙特卡罗模拟研究表明，VR-WT 能准确检测结构变化点并选择合适的窗口大小。在对模型参数进行准确估计并推断其值、符号和重要性至关重要的应用中，VR-WT 至关重要。VR-WT 还帮助我们理解了基于参数的推断的变化，确保了跨时期的稳定性，并强调了市场冲击的时间和影响在不同领域和数据集之间的差异。VR-WT 专注于精确的参数估计。通过动态跟踪窗口大小，VR-WT 可灵活选择窗口大小，实现结构变化的可视化。VR-WT 的第二个特点在于其广泛的适用性和多功能性。我们在三个研究领域进行了实证应用：CAPM；全球股票市场之间的相互依存分析；以及随时间变化的能源价格研究。

{"title":"Tracking the size of the estimation window in time-series data","authors":"Tae Yeon Kwon","doi":"10.1108/dta-11-2023-0797","DOIUrl":"https://doi.org/10.1108/dta-11-2023-0797","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>This paper introduces a novel method, Variance Rule-based Window Size Tracking (VR-WT), for deriving a sequence of estimation window sizes. This approach not only identifies structural change points but also ascertains the optimal size of the estimation window. VR-WT is designed to achieve accurate model estimation and is versatile enough to be applied across a range of models in various disciplines.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>This paper proposes a new method named Variance Rule-based Window size Tracking (VR-WT), which derives a sequence of estimation window sizes. The concept of VR-WT is inspired by the Potential Scale Reduction Factor (PSRF), a tool used to evaluate the convergence and stationarity of MCMC.</p>\u0000<h3>Findings</h3>\u0000<p>Monte Carlo simulation study demonstrates that VR-WT accurately detects structural change points and select appropriate window sizes. The VR-WT is essential in applications where accurate estimation of model parameters and inference about their value, sign, and significance are critical. The VR-WT has also helped us understand shifts in parameter-based inference, ensuring stability across periods and highlighting how the timing and impact of market shocks vary across fields and datasets.</p>\u0000<h3>Originality/value</h3>\u0000<p>The first distinction of the VR-WT lies in its purpose and methodological differences. The VR-WT focuses on precise parameter estimation. By dynamically tracking window sizes, VR-WT selects flexible window sizes and enables the visualization of structural changes. The second distinction of VR-WT lies in its broad applicability and versatility. We conducted empirical applications across three fields of study: CAPM; interdependence analysis between global stock markets; and the study of time-dependent energy prices.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"82 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel similarity measure SF-IPF for CBKNN with implicit feedback data 隐式反馈数据 CBKNN 的新型相似性测量 SF-IPF

IF 1.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications

Pub Date : 2024-06-04 DOI: 10.1108/dta-07-2023-0370

Rajalakshmi Sivanaiah, Mirnalinee T T, Sakaya Milton R

Purpose

The increasing popularity of music streaming services also increases the need to customize the services for each user to attract and retain customers. Most of the music streaming services will not have explicit ratings for songs; they will have only implicit feedback data, i.e user listening history. For efficient music recommendation, the preferences of the users have to be infered, which is a challenging task.

Design/methodology/approach

Preferences of the users can be identified from the users' listening history. In this paper, a hybrid music recommendation system is proposed that infers features from user's implicit feedback and uses the hybrid of content-based and collaborative filtering method to recommend songs. A Content Boosted K-Nearest Neighbours (CBKNN) filtering technique was proposed, which used the users' listening history, popularity of songs, song features, and songs of similar interested users for recommending songs. The song features are taken as content features. Song Frequency–Inverse Popularity Frequency (SF-IPF) metric is proposed to find the similarity among the neighbours in collaborative filtering. Million Song Dataset and Echo Nest Taste Profile Subset are used as data sets.

Findings

The proposed CBKNN technique with SF-IPF similarity measure to identify similar interest neighbours performs better than other machine learning techniques like linear regression, decision trees, random forest, support vector machines, XGboost and Adaboost. The performance of proposed SF-IPF was tested with other similarity metrics like Pearson and Cosine similarity measures, in which SF-IPF results in better performance.

Originality/value

This method was devised to infer the user preferences from the implicit feedback data and it is converted as rating preferences. The importance of adding content features with collaborative information is analysed in hybrid filtering. A new similarity metric SF-IPF is formulated to identify the similarity between the users in collaborative filtering.

目的随着音乐流媒体服务的日益普及，为每个用户定制服务以吸引和留住客户的需求也随之增加。大多数音乐流媒体服务都没有明确的歌曲评级，只有隐含的反馈数据，即用户的收听历史。为了实现高效的音乐推荐，必须推断出用户的偏好，而这是一项具有挑战性的任务。本文提出了一种混合音乐推荐系统，它能从用户的隐式反馈中推断出特征，并使用基于内容和协同过滤的混合方法来推荐歌曲。本文提出了一种内容增强 K 近邻（CBKNN）过滤技术，该技术利用用户的收听历史、歌曲流行度、歌曲特征以及类似兴趣用户的歌曲来推荐歌曲。歌曲特征被视为内容特征。提出了歌曲频率-反向流行频率（SF-IPF）指标，用于查找协作过滤中相邻用户之间的相似性。研究结果与线性回归、决策树、随机森林、支持向量机、XGboost 和 Adaboost 等其他机器学习技术相比，利用 SF-IPF 相似性度量来识别相似兴趣邻域的 CBKNN 技术表现更好。提议的 SF-IPF 的性能与其他相似度量（如皮尔逊和余弦相似度量）进行了测试，其中 SF-IPF 的性能更好。分析了在混合过滤中添加内容特征与协作信息的重要性。提出了一种新的相似度量 SF-IPF，用于识别协同过滤中用户之间的相似性。

{"title":"A novel similarity measure SF-IPF for CBKNN with implicit feedback data","authors":"Rajalakshmi Sivanaiah, Mirnalinee T T, Sakaya Milton R","doi":"10.1108/dta-07-2023-0370","DOIUrl":"https://doi.org/10.1108/dta-07-2023-0370","url":null,"abstract":"<h3>Purpose</h3>\u0000<p>The increasing popularity of music streaming services also increases the need to customize the services for each user to attract and retain customers. Most of the music streaming services will not have explicit ratings for songs; they will have only implicit feedback data, i.e user listening history. For efficient music recommendation, the preferences of the users have to be infered, which is a challenging task.</p>\u0000<h3>Design/methodology/approach</h3>\u0000<p>Preferences of the users can be identified from the users' listening history. In this paper, a hybrid music recommendation system is proposed that infers features from user's implicit feedback and uses the hybrid of content-based and collaborative filtering method to recommend songs. A Content Boosted K-Nearest Neighbours (CBKNN) filtering technique was proposed, which used the users' listening history, popularity of songs, song features, and songs of similar interested users for recommending songs. The song features are taken as content features. Song Frequency–Inverse Popularity Frequency (SF-IPF) metric is proposed to find the similarity among the neighbours in collaborative filtering. Million Song Dataset and Echo Nest Taste Profile Subset are used as data sets.</p>\u0000<h3>Findings</h3>\u0000<p>The proposed CBKNN technique with SF-IPF similarity measure to identify similar interest neighbours performs better than other machine learning techniques like linear regression, decision trees, random forest, support vector machines, XGboost and Adaboost. The performance of proposed SF-IPF was tested with other similarity metrics like Pearson and Cosine similarity measures, in which SF-IPF results in better performance.</p>\u0000<h3>Originality/value</h3>\u0000<p>This method was devised to infer the user preferences from the implicit feedback data and it is converted as rating preferences. The importance of adding content features with collaborative information is analysed in hybrid filtering. A new similarity metric SF-IPF is formulated to identify the similarity between the users in collaborative filtering.</p>","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"17 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141254302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0