Loading Events
  • This event has passed.

SparkSQL and DataFrames with PySpark

November 13 @ 2:30 pm - 5:00 pm

Rackham Building, Earl Lewis Room, 3rd Floor East

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Industry has quickly adopted Spark and deployed it at scale for processing big data. Its main advantage include in-memory processing and a rich set of operations for wrangling data using DataFrames.

In this workshop, we’ll introduce attendees to SparkSQL and DataFrames for basic data manipulation, file I/O and SQL querying. Spark has language bindings to R, Python, Scala and Java. We’ll be using PySpark (the Python API) in our workshop. The workshop is intended for users with ​INTERMEDIATE knowledge of R, Python, or comparable language. Attendees should be familiar with DataFrames in Python (pandas) or R (dplyr).

Attendees will NEED to have a Cavium account beforehand to participate. http://myumi.ch/6pn5d

Attendees will also need to bring their own laptops as a computer will not be provided.

Details

Date:
November 13
Time:
2:30 pm - 5:00 pm
Event Category:

Venue

Rackham Building, Earl Lewis Room, 3rd Floor East
915 E. Washington St.
Ann Arbor, MI 48109 United States

Organizer

CSCAR
Email:
cscar@umich.edu
Website:
cscar.research.umich.edu

Other

Register
Presenters
Alex Cao, Ellen Paquet and Seth Meyer