数据预处理，用于销售代表关键绩效指标的机器分析

IF 0.5 Q4 BUSINESS Biznes Informatika-Business Informatics Pub Date : 2021-09-30 DOI:10.17323/2587-814x.2021.3.48.59

A. Vladova, Elena Shek

{"title":"数据预处理，用于销售代表关键绩效指标的机器分析","authors":"A. Vladova, Elena Shek","doi":"10.17323/2587-814x.2021.3.48.59","DOIUrl":null,"url":null,"abstract":"Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.","PeriodicalId":41920,"journal":{"name":"Biznes Informatika-Business Informatics","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data preprocessing for machine analysis of sales representatives’ key performance indicators\",\"authors\":\"A. Vladova, Elena Shek\",\"doi\":\"10.17323/2587-814x.2021.3.48.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.\",\"PeriodicalId\":41920,\"journal\":{\"name\":\"Biznes Informatika-Business Informatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2021-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biznes Informatika-Business Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17323/2587-814x.2021.3.48.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biznes Informatika-Business Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/2587-814x.2021.3.48.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}

引用次数: 0

摘要

数据接收和处理技术的变化推动了产品和服务分销商运营活动的重大转变。目前，这些公司代表的工作在很大程度上是数字化的：例如，与客户会面的道路时间、次数和地点都会自动记录下来。与此同时，不进行直销的管理人员的生产力通常是在调查、专家和昂贵的双重访问的帮助下进行评估的，尽管存在大量数据样本，因此可以使用统计分析来确定业绩指标的价值不足和虚高。来源数据：一个关系数据库，收集了去年销售代表活动的28个分类、定量、地理位置和时间参数的信息。根据现有数据，我们创建了合成特征（纬度和经度特征产生了索引、区域、街道和房屋特征；基于标识符，我们计算了销售代表的活动总和；根据临时特征，我们定义了一年中的季节、一周中的哪一天和一天中的某一时段特征）。统计分析方法包括三个主要阶段：收集和处理原始数据；对处理后的信息进行汇总和分组；设置统计假设并解释结果。采用概率方法对销售代表活动的扭曲程度进行建模。因此，通过构建标签云，我们强调了：最受欢迎的广告季；生产效率最高的部门和销售代表；一周中与客户联系人数最多的几天。我们建立了大量关于周末与客户会面的记录。作为数据挖掘的结果，我们对识别扭曲会议次数和参数的销售代表的可能性进行了统计假设。创建了一组综合整数、实数和分类特征来识别隐藏的关系。可疑数据（如周末或夜间工作）被披露。生成的聚合数据集按销售代表的活动ID分组，并绘制该特征的分布图。对于每个销售代表，总结整数和真实特征，并检测到表征低效性能或数据失真的异常值。因此，大量流动和活动历史数据的存在使我们能够根据间接特征评估分销公司销售代表的生产力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data preprocessing for machine analysis of sales representatives’ key performance indicators

Significant transformation of the operational activity of product and service distributors is driven by changes in data-receiving and processing technology. At present, the work of these companies’ representatives is digitized to a large extent: for example, the road time, the number and places of meetings with customers are automatically recorded. At the same time, the productivity of managers who do not make direct sales is usually evaluated with the help of surveys, experts and costly double visits, although the existence of large data samples makes possible the use of statistical analysis to identify both insufficient and inflated values of performance indicators. Source data: a relational database that accumulates information about 28 categorical, quantitative, geolocation and temporal parameters of sale representatives’ activities for the last year. Based on available data, we created synthetic features (the latitude and longitude features produced the index, region, street, and house features; based upon identifiers we calculated the sum of activities of sales representatives; according to temporary features we defined the season of the year, the day of the week and the period of day features). The methodology for statistical analysis consists of three main stages: collection and processing of primary data; summary and grouping processed information; setting statistical hypotheses and interpreting the results. A probabilistic approach was used to model the level of distortion of sale representatives’ activities. As a result, with the built tag cloud we highlighted: the most popular season for advertising campaigns; the most productive departments and sale representatives; days of the week with the largest number of contacts to customers. We established a significant number of records about meetings with clients at the weekends. As a result of the data mining, we made a statistical hypothesis about the possibility of identifying the sale representatives who distort the number and parameters of meetings. A set of synthetic integer, real and categorical features was created to identify hidden relationships. Doubtful data (such as working at weekends or at night) were revealed. The resulting aggregated dataset is grouped by a sale representative’s activity ID and the distribution of this feature is plotted. For each sale representative, integer and real features are summarized and outliers that characterize inefficient performance or distortion of data have been detected. Thus, the presence of a large sample of data on the history of movements and activities allowed us to evaluate the productivity of the distribution company’s sales representatives based upon indirect features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biznes Informatika-Business Informatics BUSINESS-

自引率

33.30%

发文量