
- This event has passed.
Event Navigation
SparkSQL and DataFrames with PySpark
November 13, 2018 @ 2:30 pm - 5:00 pm
Rackham Building, Earl Lewis Room, 3rd Floor East
Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Industry has quickly adopted Spark and deployed it at scale for processing big data. Its main advantage include in-memory processing and a rich set of operations for wrangling data using DataFrames.
In this workshop, we’ll introduce attendees to SparkSQL and DataFrames for basic data manipulation, file I/O and SQL querying. Spark has language bindings to R, Python, Scala and Java. We’ll be using PySpark (the Python API) in our workshop. The workshop is intended for users with INTERMEDIATE knowledge of R, Python, or comparable language. Attendees should be familiar with DataFrames in Python (pandas) or R (dplyr).
Attendees will NEED to have a Cavium account beforehand to participate. http://myumi.ch/6pn5d
Attendees will also need to bring their own laptops as a computer will not be provided.