首页 > 最新文献

Proceedings of the 2019 International Conference on Data Mining and Machine Learning最新文献

英文 中文
Demand Forecasting Based on Machine Learning for Mass Customization in Smart Manufacturing 基于机器学习的智能制造大规模定制需求预测
Myungsoo Kim, Jongpil Jeong, Sang-Pil Bae
Mass customization is essential for smart manufacturing. In particular, generating demand forecast is undoubtedly the most important part of any industry. Appropriate demand forecasts make S&OP quality, which greatly contributes to overall corporate management. In addition, proper stock can be maintained to save the costs of maintaining multiple warehouses. In this paper, we find out why mass customization is needed in smart manufacturing and find appropriate demand forecasting techniques by comparing the traditional time series technique ARIMA analysis with the nonlinear network model. Afterwards, the company develops an algorithm to evaluate the sales process by finalizing the production plan by evaluating the expected inventory through mathematical modelling.
大规模定制是智能制造的关键。特别是,需求预测无疑是任何行业最重要的部分。适当的需求预测可以提高S&OP的质量,对企业的整体管理有很大的帮助。此外,可以保持适当的库存,以节省维护多个仓库的成本。本文通过比较传统的时间序列技术ARIMA分析和非线性网络模型,找出智能制造需要大规模定制的原因,并找到合适的需求预测技术。然后,公司通过数学建模评估预期库存,最终确定生产计划,开发出评估销售过程的算法。
{"title":"Demand Forecasting Based on Machine Learning for Mass Customization in Smart Manufacturing","authors":"Myungsoo Kim, Jongpil Jeong, Sang-Pil Bae","doi":"10.1145/3335656.3335658","DOIUrl":"https://doi.org/10.1145/3335656.3335658","url":null,"abstract":"Mass customization is essential for smart manufacturing. In particular, generating demand forecast is undoubtedly the most important part of any industry. Appropriate demand forecasts make S&OP quality, which greatly contributes to overall corporate management. In addition, proper stock can be maintained to save the costs of maintaining multiple warehouses. In this paper, we find out why mass customization is needed in smart manufacturing and find appropriate demand forecasting techniques by comparing the traditional time series technique ARIMA analysis with the nonlinear network model. Afterwards, the company develops an algorithm to evaluate the sales process by finalizing the production plan by evaluating the expected inventory through mathematical modelling.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121757088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Assistant Decision-making Method of "NIMBY" Crisis Conversion in Waste Incineration Based on "Reputation and Benefit Space" 基于“声誉与利益空间”的垃圾焚烧“邻避”危机转化辅助决策方法
Enyuan Liu, Minxuan Li, Shengya Liu
Waste incineration power generation, as a waste disposal method of "reduction, harmlessness and resource utilization", is an important measure to improve national well-being index and guarantee the achievement of overall well-off struggle. However, in the process of project promotion, it is faced with the problem of landing difficulties caused by "NIMBY". In order to solve this social problem scientifically and quantitatively, this paper innovatively constructs a network case analysis method based on reputation and benefit space, and abstracts a clustering center with scientific management significance by case clustering method to evolve reputation and benefit space. Based on this, a decision-making aided method based on similarity calculation is constructed to provide support for the transformation of "NIMBY" crisis.
垃圾焚烧发电作为一种“减量化、无害化、资源化”的垃圾处理方式,是提高国民幸福指数、实现全面小康奋斗目标的重要举措。但在项目推进过程中,却面临着“邻避”导致落地困难的问题。为了科学、定量地解决这一社会问题,本文创新性地构建了基于声誉与效益空间的网络案例分析方法,并通过案例聚类方法抽象出具有科学管理意义的聚类中心来演化声誉与效益空间。在此基础上,构建了基于相似度计算的决策辅助方法,为“邻避”危机的转化提供支持。
{"title":"Assistant Decision-making Method of \"NIMBY\" Crisis Conversion in Waste Incineration Based on \"Reputation and Benefit Space\"","authors":"Enyuan Liu, Minxuan Li, Shengya Liu","doi":"10.1145/3335656.3335686","DOIUrl":"https://doi.org/10.1145/3335656.3335686","url":null,"abstract":"Waste incineration power generation, as a waste disposal method of \"reduction, harmlessness and resource utilization\", is an important measure to improve national well-being index and guarantee the achievement of overall well-off struggle. However, in the process of project promotion, it is faced with the problem of landing difficulties caused by \"NIMBY\". In order to solve this social problem scientifically and quantitatively, this paper innovatively constructs a network case analysis method based on reputation and benefit space, and abstracts a clustering center with scientific management significance by case clustering method to evolve reputation and benefit space. Based on this, a decision-making aided method based on similarity calculation is constructed to provide support for the transformation of \"NIMBY\" crisis.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130895066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Research on Code Plagiarism Detection Model Based on Random Forest and Gradient Boosting Decision Tree 基于随机森林和梯度增强决策树的代码抄袭检测模型研究
Huang Qiubo, Tang Jingdong, Fang Guo-zheng
This paper studies the Online Judge System for assignments such as programming. Sometimes there are plagiarismsin codes submitted by students[1]. In addition to calculating the similarity degree between the codes, we also extract other features to determine whether there isplagiarismsuspicion of a submitted code or not. By using combination of Random Forest and Gradient Boosting Decision Tree, we also can getitssuspicion level. The model first calculates the similarity degree between the newly submitted code and all submitted codes, and determines plagiarism suspect. For some codes that are difficult to confirm whetherisplagiarismor not, we extract the programming style similarity degree, and the student's submission behavior pattern (such as similar target concentration degree) and other features, to create decision trees such as Random Forestand Gradient Boosting Decision Trees, which can help determine the level of plagiarism suspect. If the level is medium, the teacher will mark the code as plagiarized or not. Finally, the learning model is incrementally trained to improve the accuracy of the model and the classification results. Experiment results show that the accuracy rate can reach 95.9%. As a result, the model can prevent students from plagiarizing while minimizing the workload of the teacher.
本文研究了编程等作业的在线裁判系统。有时会有学生提交的剽窃代码。除了计算代码之间的相似度外,我们还提取了其他特征来确定提交的代码是否存在抄袭嫌疑。通过将随机森林与梯度增强决策树相结合,我们还可以得到怀疑程度。该模型首先计算新提交的代码与所有提交的代码之间的相似度,并确定抄袭嫌疑。对于一些难以确定是否抄袭的代码,我们提取编程风格的相似度,以及学生的提交行为模式(如相似目标集中度)等特征,创建决策树,如Random Forestand Gradient Boosting decision trees,可以帮助确定抄袭嫌疑的程度。如果水平是中等,老师会将代码标记为抄袭或不抄袭。最后,对学习模型进行增量训练,提高模型的准确率和分类结果。实验结果表明,该方法的准确率可达95.9%。因此,该模式可以防止学生抄袭,同时最大限度地减少教师的工作量。
{"title":"Research on Code Plagiarism Detection Model Based on Random Forest and Gradient Boosting Decision Tree","authors":"Huang Qiubo, Tang Jingdong, Fang Guo-zheng","doi":"10.1145/3335656.3335692","DOIUrl":"https://doi.org/10.1145/3335656.3335692","url":null,"abstract":"This paper studies the Online Judge System for assignments such as programming. Sometimes there are plagiarismsin codes submitted by students[1]. In addition to calculating the similarity degree between the codes, we also extract other features to determine whether there isplagiarismsuspicion of a submitted code or not. By using combination of Random Forest and Gradient Boosting Decision Tree, we also can getitssuspicion level. The model first calculates the similarity degree between the newly submitted code and all submitted codes, and determines plagiarism suspect. For some codes that are difficult to confirm whetherisplagiarismor not, we extract the programming style similarity degree, and the student's submission behavior pattern (such as similar target concentration degree) and other features, to create decision trees such as Random Forestand Gradient Boosting Decision Trees, which can help determine the level of plagiarism suspect. If the level is medium, the teacher will mark the code as plagiarized or not. Finally, the learning model is incrementally trained to improve the accuracy of the model and the classification results. Experiment results show that the accuracy rate can reach 95.9%. As a result, the model can prevent students from plagiarizing while minimizing the workload of the teacher.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114602273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Joint Transceiver Design for Fully-Duplex Cloud-Access DEINs 全双工云接入dein的联合收发器设计
Qin Yu, Junliang Yu, Jie Hu, Kun Yang, Taijun Wang, Rongsheng Ding
With the rapid development of communication technology, the demand of rate of communication network is higher and higher, and the problem of energy consumption is becoming more and more serious. Data and Energy Integrated communication Networks (DEINs) can simultaneously transmit information and energy for the terminal, which greatly improves the convenience of the terminal and makes devices without batteries possible in future. This paper studies the joint design of transceivers in a full-duplex cloud access number-integrated network. The system model considers both upstream and downstream users. Considering the need for joint resource allocation for system uplink and downlink, full-duplex technology and self-interference caused by full-duplex technology are considered into the system. The optimization goal of this problem is to minimize the total power consumption under the uplink and downlink SINR and EH constraints. For this non-convex optimization problem, An algorithm combining ZF beamforming and MRT beamforming is proposed. In the hybrid beamforming algorithm, the zero-forcing (ZF) beamformer and MRT beamformer are linearly combined, which simplifies the optimization of the downlink beam vector to the optimization of the combination ratio. The proposed algorithm is simulated. Simulation results show that the power consumed in the half-duplex scenario is higher than that in the full duplex scenario. The time spent in the hybrid beamforming algorithm does not change with the increase in the number of RRH antennas.
随着通信技术的飞速发展,对通信网络速率的要求越来越高,能耗问题也越来越严重。数据与能量集成通信网络(DEINs)可以同时为终端传输信息和能量,大大提高了终端的便利性,使未来无电池设备成为可能。研究了全双工云接入数字集成网络中收发器的联合设计。系统模型同时考虑了上游和下游用户。考虑到系统上下行链路需要联合资源分配,系统中考虑了全双工技术和全双工技术带来的自干扰。该问题的优化目标是在上行链路和下行链路SINR和EH约束下使总功耗最小。针对这一非凸优化问题,提出了一种ZF波束形成和MRT波束形成相结合的算法。在混合波束形成算法中,零强迫(zero-forcing, ZF)波束形成器与MRT波束形成器线性组合,将下行波束矢量的优化简化为组合比的优化。对该算法进行了仿真。仿真结果表明,半双工场景下的功耗要高于全双工场景。混合波束形成算法所花费的时间不随RRH天线数量的增加而变化。
{"title":"Joint Transceiver Design for Fully-Duplex Cloud-Access DEINs","authors":"Qin Yu, Junliang Yu, Jie Hu, Kun Yang, Taijun Wang, Rongsheng Ding","doi":"10.1145/3335656.3335691","DOIUrl":"https://doi.org/10.1145/3335656.3335691","url":null,"abstract":"With the rapid development of communication technology, the demand of rate of communication network is higher and higher, and the problem of energy consumption is becoming more and more serious. Data and Energy Integrated communication Networks (DEINs) can simultaneously transmit information and energy for the terminal, which greatly improves the convenience of the terminal and makes devices without batteries possible in future. This paper studies the joint design of transceivers in a full-duplex cloud access number-integrated network. The system model considers both upstream and downstream users. Considering the need for joint resource allocation for system uplink and downlink, full-duplex technology and self-interference caused by full-duplex technology are considered into the system. The optimization goal of this problem is to minimize the total power consumption under the uplink and downlink SINR and EH constraints. For this non-convex optimization problem, An algorithm combining ZF beamforming and MRT beamforming is proposed. In the hybrid beamforming algorithm, the zero-forcing (ZF) beamformer and MRT beamformer are linearly combined, which simplifies the optimization of the downlink beam vector to the optimization of the combination ratio. The proposed algorithm is simulated. Simulation results show that the power consumed in the half-duplex scenario is higher than that in the full duplex scenario. The time spent in the hybrid beamforming algorithm does not change with the increase in the number of RRH antennas.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128676020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The extraction research on evaluation rules for students based on discernibility matrix 基于差别矩阵的学生评价规则提取研究
Fan Yan-ying, Zhang Zi-min, Chen Guan-ping, Zheng Shi-yong
We apply the rough set theory to the evaluation process of students. Firstly, we should create the information table of evaluation decision by using the discernibility matrix of rough set theory to do attribute reduction for the evaluation data and hence reduce unnecessary evaluation indicators. We will do value reduction and rule extraction algorithm based on this and then dig out the general rule of the evaluation for students from enormous evaluation data in order to provide decision basis for the work of students at school. This process of evaluation is totally about enabling the date talk and reduce the influence of human-dominated factors, as a result, the outcome of the evaluation will be more objective and fair.
我们将粗糙集理论应用到学生的评价过程中。首先,利用粗糙集理论的差别矩阵建立评价决策信息表,对评价数据进行属性约简,减少不必要的评价指标;我们将在此基础上进行值约简和规则提取算法,从海量的评价数据中挖掘出学生评价的一般规律,为学生在学校的工作提供决策依据。这个评估过程完全是为了实现约会谈话,减少人为因素的影响,从而使评估结果更加客观公正。
{"title":"The extraction research on evaluation rules for students based on discernibility matrix","authors":"Fan Yan-ying, Zhang Zi-min, Chen Guan-ping, Zheng Shi-yong","doi":"10.1145/3335656.3335680","DOIUrl":"https://doi.org/10.1145/3335656.3335680","url":null,"abstract":"We apply the rough set theory to the evaluation process of students. Firstly, we should create the information table of evaluation decision by using the discernibility matrix of rough set theory to do attribute reduction for the evaluation data and hence reduce unnecessary evaluation indicators. We will do value reduction and rule extraction algorithm based on this and then dig out the general rule of the evaluation for students from enormous evaluation data in order to provide decision basis for the work of students at school. This process of evaluation is totally about enabling the date talk and reduce the influence of human-dominated factors, as a result, the outcome of the evaluation will be more objective and fair.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal Changes of Tourists Based on Multi-source data in Chengdu 基于多源数据的成都市游客时空变化研究
R. Yuan
The popularity of mobile internet accelerates the dissemination and communication of information and also changes the way tourists obtain information. Tourists no longer rely on the officially published travel brochures and TV programs to obtain tourism information. Through Twitter, Sina Weibo, Facebook and other We-Media channels, tourists can get first-hand information about the tourist destination. A large number of GPS trajectory data, such as taxi trajectory data and mobile signaling data, are generated through the widely existing GPS sensors and have been widely used in traffic and resident travel research. Since tourists are not familiar with the road distribution and traffic rules of the destination city, taxi car is an important travel method for non-local tourists to choose, and its OD(origin-destination) points reflect the travel needs and travel characteristics of tourists. Therefore, this paper applies the taxi data to the tourism research. In our study, CFSDPF clustering algorithm is adopted to cluster Sina Weibo data to form tourism ROI (region of interest), and the tourism ROI is used to cluster taxi OD data. The travel characteristics of tourists can be fully and accurately reflected through multi-source data. From two different scales of citywide and central city, we can comprehensively analyze the relationship between the travel characteristics of tourists in chengdu and the tourism ROI.
移动互联网的普及加速了信息的传播和交流,也改变了游客获取信息的方式。旅游者不再依靠官方出版的旅游手册和电视节目来获取旅游信息。通过Twitter、新浪微博、Facebook等自媒体渠道,游客可以获得旅游目的地的第一手信息。大量的GPS轨迹数据是通过广泛存在的GPS传感器产生的,如出租车轨迹数据、移动信令数据等,已广泛应用于交通和居民出行研究中。由于游客不熟悉目的地城市的道路分布和交通规则,出租车是外地游客选择的重要出行方式,其OD(出发地)点反映了游客的出行需求和出行特点。因此,本文将出租车数据应用到旅游研究中。本研究采用CFSDPF聚类算法对新浪微博数据进行聚类,形成旅游ROI(兴趣区域),并利用旅游ROI对出租车OD数据进行聚类。通过多源数据,可以充分、准确地反映旅游者的旅游特征。从全市和中心城市两个不同的尺度,可以全面分析成都游客的旅游特征与旅游投资回报率之间的关系。
{"title":"Spatio-temporal Changes of Tourists Based on Multi-source data in Chengdu","authors":"R. Yuan","doi":"10.1145/3335656.3335696","DOIUrl":"https://doi.org/10.1145/3335656.3335696","url":null,"abstract":"The popularity of mobile internet accelerates the dissemination and communication of information and also changes the way tourists obtain information. Tourists no longer rely on the officially published travel brochures and TV programs to obtain tourism information. Through Twitter, Sina Weibo, Facebook and other We-Media channels, tourists can get first-hand information about the tourist destination. A large number of GPS trajectory data, such as taxi trajectory data and mobile signaling data, are generated through the widely existing GPS sensors and have been widely used in traffic and resident travel research. Since tourists are not familiar with the road distribution and traffic rules of the destination city, taxi car is an important travel method for non-local tourists to choose, and its OD(origin-destination) points reflect the travel needs and travel characteristics of tourists. Therefore, this paper applies the taxi data to the tourism research. In our study, CFSDPF clustering algorithm is adopted to cluster Sina Weibo data to form tourism ROI (region of interest), and the tourism ROI is used to cluster taxi OD data. The travel characteristics of tourists can be fully and accurately reflected through multi-source data. From two different scales of citywide and central city, we can comprehensively analyze the relationship between the travel characteristics of tourists in chengdu and the tourism ROI.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"14 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130410592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Simulation for Agglomeration Effect of Internet Crowdfunding Model 互联网众筹模型集聚效应仿真
Yunjie Ji, YanXia Zhu
Crowdfunding has become an important channel for the transformation of innovation achievements. Exploring the healthy and rapid development of crowdfunding is a hot of academic research. This paper simulates the agglomeration effect in the development of crowdfunding mode through multi-agent system. And this paper findsthat properly supporting superior enterprises or high-quality projects and concentrating resources to stimulate innovation and transformation, are beneficial to improve the whole development level of the crowdfunding system without reducing the stability of the system operation.
众筹已成为创新成果转化的重要渠道。探索众筹的健康快速发展是学术界研究的热点。本文通过多智能体系统模拟了众筹模式发展中的集聚效应。研究发现,适当扶持优势企业或优质项目,集中资源激发创新和转型,有利于在不降低系统运行稳定性的前提下,提高众筹系统的整体发展水平。
{"title":"Simulation for Agglomeration Effect of Internet Crowdfunding Model","authors":"Yunjie Ji, YanXia Zhu","doi":"10.1145/3335656.3335682","DOIUrl":"https://doi.org/10.1145/3335656.3335682","url":null,"abstract":"Crowdfunding has become an important channel for the transformation of innovation achievements. Exploring the healthy and rapid development of crowdfunding is a hot of academic research. This paper simulates the agglomeration effect in the development of crowdfunding mode through multi-agent system. And this paper findsthat properly supporting superior enterprises or high-quality projects and concentrating resources to stimulate innovation and transformation, are beneficial to improve the whole development level of the crowdfunding system without reducing the stability of the system operation.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116532347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Design of Word Cloud Rendering Platform and Its Application on Measuring Systematic Financial Risks 词云绘制平台的设计及其在系统金融风险度量中的应用
Shifen Wang, Yining Sun
With the development of the Internet, the amount of daily output data is constantly increasing, and the value contained in the data is increasing as well; Meanwhile, the difficulty of data mining and the complexity of data analysis increases sharply. Developing a new data processing system is in urgent need especially in the macroeconomic field. Word cloud is a trendy way to visualize hot spot. At first, the design of a distributed batch-based website word cloud rendering platform will be explained, combining the processing mode of big data and the traditional web crawler design method to collect all the information of a website and present the data using word cloud. Then, this platform will be used for practice and applied to the measurement of systemic financial risks.
随着互联网的发展,每天输出的数据量在不断增加,数据所包含的价值也在不断增加;同时,数据挖掘的难度和数据分析的复杂性急剧增加。特别是在宏观经济领域,迫切需要开发一种新的数据处理系统。词云是可视化热点的一种流行方式。首先阐述基于分布式批处理的网站词云呈现平台的设计,将大数据的处理方式与传统的网络爬虫设计方法相结合,收集网站的全部信息,并利用词云呈现数据。然后,将该平台用于实践,并将其应用于系统性金融风险的度量。
{"title":"The Design of Word Cloud Rendering Platform and Its Application on Measuring Systematic Financial Risks","authors":"Shifen Wang, Yining Sun","doi":"10.1145/3335656.3335698","DOIUrl":"https://doi.org/10.1145/3335656.3335698","url":null,"abstract":"With the development of the Internet, the amount of daily output data is constantly increasing, and the value contained in the data is increasing as well; Meanwhile, the difficulty of data mining and the complexity of data analysis increases sharply. Developing a new data processing system is in urgent need especially in the macroeconomic field. Word cloud is a trendy way to visualize hot spot. At first, the design of a distributed batch-based website word cloud rendering platform will be explained, combining the processing mode of big data and the traditional web crawler design method to collect all the information of a website and present the data using word cloud. Then, this platform will be used for practice and applied to the measurement of systemic financial risks.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125073815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Innovation of Data Mining & Screening System under Big Data: Take a case as NIMBY 大数据下的数据挖掘与筛选系统创新——以邻避为例
Minxuan Li
With the growing maturity of web crawler technology and the advent of the era of big data, when you want to study some problems, you can directly get all the data related to them through web crawlers and other means, but it is more important to mining and filter the data to get valuable data for the research content. This study which based on the word list of keywords uses CRN network to construct semantic distance table and TOPSIS evaluation system to sort data to make sure researchers can obtain quantitative screening data with research value and to provide researchers with scientific screening methods.
随着网络爬虫技术的日益成熟和大数据时代的到来,当你想要研究一些问题时,你可以通过网络爬虫等手段直接获得与之相关的所有数据,但对数据进行挖掘和过滤,为研究内容获取有价值的数据更为重要。本研究以关键词词表为基础,利用CRN网络构建语义距离表和TOPSIS评价系统对数据进行排序,确保研究者能够获得具有研究价值的定量筛选数据,为研究者提供科学的筛选方法。
{"title":"Innovation of Data Mining & Screening System under Big Data: Take a case as NIMBY","authors":"Minxuan Li","doi":"10.1145/3335656.3335688","DOIUrl":"https://doi.org/10.1145/3335656.3335688","url":null,"abstract":"With the growing maturity of web crawler technology and the advent of the era of big data, when you want to study some problems, you can directly get all the data related to them through web crawlers and other means, but it is more important to mining and filter the data to get valuable data for the research content. This study which based on the word list of keywords uses CRN network to construct semantic distance table and TOPSIS evaluation system to sort data to make sure researchers can obtain quantitative screening data with research value and to provide researchers with scientific screening methods.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133751573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene clustering, enrichment and survival analysis of differentially expressed genes in Low Grade Glioma between different genders by big data analysis 大数据分析低分级胶质瘤不同性别差异表达基因聚类、富集及生存分析
Jianzhi Deng, Xiaohui Cheng, Yuehan Zhou
In this study, we aim to reveal the relationship between differentially expressed mRNAs and genders in low grade glioma (LGG) patients. Based on the downloaded RNA-seq files of LGG patients from The Cancer Genome Atlas database (TCGA), 89 differentially expressed mRNAs between male and female were screened out by clustering analysis. The differentially expressed mRNAs were analyzed by DAVID and KOBAS online tools. The differentially expressed mRNAs were enriched in 67 gene ontology terms, including cellular components, biological processes and molecular functions group and 7 signaling pathways. Then, the differentially expressed mRNAs were divided into two parts according to the expression level. The high-expressed mRNAs and low-expressed mRNAs were all analyzed with the clinical survival data by Kaplan-meier method and kmplot survival curves. In comparison with the LGG female patients, male patients with a differential expression mRNAs were closely related to the higher risk of LGG.
在这项研究中,我们旨在揭示低级别胶质瘤(LGG)患者差异表达mrna与性别之间的关系。基于从the Cancer Genome Atlas database (TCGA)中下载的LGG患者RNA-seq文件,通过聚类分析筛选出89个男女差异表达mrna。通过DAVID和KOBAS在线工具分析差异表达mrna。差异表达mrna富集于67个基因本体术语,包括细胞组分、生物过程和分子功能群以及7个信号通路。然后,根据表达水平将差异mrna分成两部分。采用Kaplan-meier法和kmplot生存曲线分析高表达mrna和低表达mrna的临床生存数据。与LGG女性患者相比,mrna表达差异的男性患者与LGG的高风险密切相关。
{"title":"Gene clustering, enrichment and survival analysis of differentially expressed genes in Low Grade Glioma between different genders by big data analysis","authors":"Jianzhi Deng, Xiaohui Cheng, Yuehan Zhou","doi":"10.1145/3335656.3335699","DOIUrl":"https://doi.org/10.1145/3335656.3335699","url":null,"abstract":"In this study, we aim to reveal the relationship between differentially expressed mRNAs and genders in low grade glioma (LGG) patients. Based on the downloaded RNA-seq files of LGG patients from The Cancer Genome Atlas database (TCGA), 89 differentially expressed mRNAs between male and female were screened out by clustering analysis. The differentially expressed mRNAs were analyzed by DAVID and KOBAS online tools. The differentially expressed mRNAs were enriched in 67 gene ontology terms, including cellular components, biological processes and molecular functions group and 7 signaling pathways. Then, the differentially expressed mRNAs were divided into two parts according to the expression level. The high-expressed mRNAs and low-expressed mRNAs were all analyzed with the clinical survival data by Kaplan-meier method and kmplot survival curves. In comparison with the LGG female patients, male patients with a differential expression mRNAs were closely related to the higher risk of LGG.","PeriodicalId":396772,"journal":{"name":"Proceedings of the 2019 International Conference on Data Mining and Machine Learning","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121108905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Proceedings of the 2019 International Conference on Data Mining and Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1