Loading Events

SparkSQL and DataFrames with PySpark

July 31 @ 9:30 am - 12:00 pm

Modern Languages Building (MLB), Room 2001A

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Industry has quickly adopted Spark and deployed it at scale for processing big data. Its main advantage include in-memory processing and a rich set of operations for wrangling data using DataFrames.

In this workshop, we’ll introduce attendees to SparkSQL and DataFrames for basic data manipulation, file I/O and SQL querying. Spark has language bindings to R, Python, Scala and Java. We’ll be using PySpark (the Python API) in our workshop. The workshop is intended for users with ​INTERMEDIATE knowledge of R, Python, or comparable language. Attendees should be familiar with DataFrames in Python (pandas) or R (dplyr). Attendees will NEED to have a Flux account beforehand to participate.

Details

Date:
July 31
Time:
9:30 am - 12:00 pm
Event Category:

Organizer

CSCAR
Email:
cscar@umich.edu
Website:
cscar.research.umich.edu

Other

Register
Presenter(s)
Alex Cao, CSCAR