Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)

Anurag Nagar
{"title":"Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)","authors":"Anurag Nagar","doi":"10.1145/3017680.3022386","DOIUrl":null,"url":null,"abstract":"This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.","PeriodicalId":344382,"journal":{"name":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3017680.3022386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and update courses in this area. One of the biggest hurdles for many schools is establishment, maintenance, and constant update of high performance computing infrastructure. Further, the technology landscape for Big Data is constantly evolving, and newer technologies, such as Apache Spark, require significant expenditure to set up and upgrade at the cluster level. Traditional infrastructure at most higher educational institutions is insufficient for this, and is also not able to scale up to meet the expectations of large class sizes and multiple simultaneous sessions. In this lightening talk, we will share our experience of running large undergraduate and graduate Big Data courses using open source infrastructure. Some of this infrastructure is cloud based, while others require students to create virtualized environment on their personal computers. Both types of resources are freely available, easy to setup, and provide students with enough computational power to run most academic tasks and projects. We will provide specific examples of using such technologies for common tasks, such as setting up a distributed file system, running MapReduce algorithms on large datasets, performing large scale machine learning and graph mining using Apache Spark, and maintaining a high availability Cassandra instance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用开源基础设施开发大数据课程(仅摘要)
这次闪电演讲将重点介绍我们开发和管理大型本科生和研究生大数据课程的经验。大数据技术领域对训练有素的专业人员的需求巨大,迫切需要开发和更新该领域的课程。对于许多学校来说,最大的障碍之一是建立、维护和不断更新高性能计算基础设施。此外,大数据的技术环境也在不断发展,像Apache Spark这样的新技术需要大量的支出来设置和升级集群级别。大多数高等教育机构的传统基础设施不足以满足这一需求,也无法扩大规模以满足大班规模和多堂同时授课的期望。在这个简短的演讲中,我们将分享我们使用开源基础设施运行大型本科生和研究生大数据课程的经验。其中一些基础设施是基于云的,而另一些则要求学生在他们的个人计算机上创建虚拟化环境。这两种类型的资源都是免费的,易于设置,并为学生提供足够的计算能力来运行大多数学术任务和项目。我们将提供将这些技术用于常见任务的具体示例,例如设置分布式文件系统,在大型数据集上运行MapReduce算法,使用Apache Spark执行大规模机器学习和图挖掘,以及维护高可用性Cassandra实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Handling Very Large Lecture Courses: Keeping the Wheels on the Bus III (Abstract Only) CS1: Computation & Cognition -- An Evidence-Based Course to Broaden Participation (Abstract Only) Lessons Learned in the Design and Delivery of an Introductory Programming MOOC Micro Projects: Putting Light and Magic into Learning Computer Systems Concepts (Abstract Only) Building Evaluative Capacity for Out of School Organizations that Engage Girls in Computer Science (Abstract Only)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1