Apache Spark 大数据入门

Introduction to Big Data with Apache Spark

Learn how to apply data science techniques using parallel programming in Apache Spark to explore big (and small) data.



  • 分类: 计算机
  • 平台: edX
  • 语言: 英语


Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (part of Apache Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.


Learn how to use Apache Spark to perform data analysis
How to use parallel programming to explore data sets
Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions
Prepare for the Spark Certified Developer exam

  • Coursera
  • edX
  • OpenLearning
  • FutureLearn
  • iversity
  • Udacity
  • NovoEd
  • Canvas
  • Open2Study
  • Google
  • ewant
  • FUN
  • IOC-Athlete-MOOC
  • World-Science-U
  • Codecademy
  • CourseSites
  • opencourseworld
  • ShareCourse
  • gacco
  • MiriadaX
  • openhpi
  • Stanford-Open-Edx
  • 网易云课堂
  • 中国大学MOOC
  • 学堂在线
  • 顶你学堂
  • 华文慕课
  • 好大学在线CnMooc

Copyright © 2008-2015 MOOC.CN 慕课改变你,你改变世界