数据预处理,用于销售代表关键绩效指标的机器分析

A. Vladova, Elena Shek
{"title":"数据预处理,用于销售代表关键绩效指标的机器分析","authors":"A. Vladova, Elena Shek","doi":"10.17323/2587-814x.2021.3.48.59","DOIUrl":null,"url":null,"abstract":"Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.","PeriodicalId":41920,"journal":{"name":"Biznes Informatika-Business Informatics","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data preprocessing for machine analysis of sales representatives’ key performance indicators\",\"authors\":\"A. Vladova, Elena Shek\",\"doi\":\"10.17323/2587-814x.2021.3.48.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.\",\"PeriodicalId\":41920,\"journal\":{\"name\":\"Biznes Informatika-Business Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biznes Informatika-Business Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17323/2587-814x.2021.3.48.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biznes Informatika-Business Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/2587-814x.2021.3.48.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 0

摘要

数据接收和处理技术的变化推动了产品和服务分销商运营活动的重大转变。目前,这些公司代表的工作在很大程度上是数字化的:例如,与客户会面的道路时间、次数和地点都会自动记录下来。与此同时,不进行直销的管理人员的生产力通常是在调查、专家和昂贵的双重访问的帮助下进行评估的,尽管存在大量数据样本,因此可以使用统计分析来确定业绩指标的价值不足和虚高。来源数据:一个关系数据库,收集了去年销售代表活动的28个分类、定量、地理位置和时间参数的信息。根据现有数据,我们创建了合成特征(纬度和经度特征产生了索引、区域、街道和房屋特征;基于标识符,我们计算了销售代表的活动总和;根据临时特征,我们定义了一年中的季节、一周中的哪一天和一天中的某一时段特征)。统计分析方法包括三个主要阶段:收集和处理原始数据;对处理后的信息进行汇总和分组;设置统计假设并解释结果。采用概率方法对销售代表活动的扭曲程度进行建模。因此,通过构建标签云,我们强调了:最受欢迎的广告季;生产效率最高的部门和销售代表;一周中与客户联系人数最多的几天。我们建立了大量关于周末与客户会面的记录。作为数据挖掘的结果,我们对识别扭曲会议次数和参数的销售代表的可能性进行了统计假设。创建了一组综合整数、实数和分类特征来识别隐藏的关系。可疑数据(如周末或夜间工作)被披露。生成的聚合数据集按销售代表的活动ID分组,并绘制该特征的分布图。对于每个销售代表,总结整数和真实特征,并检测到表征低效性能或数据失真的异常值。因此,大量流动和活动历史数据的存在使我们能够根据间接特征评估分销公司销售代表的生产力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Data preprocessing for machine analysis of sales representatives’ key performance indicators
Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
33.30%
发文量
0
期刊最新文献
Modeling and optimization of strategies for making individual decisions in multi-agent socio-economic systems with the use of machine learning An intelligent method for generating a list of job profile requirements based on neural network language models using ESCO taxonomy and online job corpus Decision support technology for a seller on a marketplace in a competitive environment The present and future of the digital transformation of real estate: A systematic review of smart real estate A knowledge management system in the strategic development of universities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1