
- This event has passed.
Event Navigation
Hadoop and Spark Workshop
November 2, 2017 @ 1:00 pm - 4:00 pm
East Hall B254
Overview
Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, Scala, and Java. Computers will be provided to follow along with hands-on examples; users can also bring laptops.
Prerequisites
Intro to the Linux Command Line or equivalent. This course assumes familiarity with the Linux command line.
A user account on Flux. If you do not have a Flux user account, click here to go to the account application page at: https://arc-ts.umich.edu/fluxform/
Duo authentication.
Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.
If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.
Hadoop queue membership. If you did not ask to be put on the training Hadoop queue when applying for a Flux user account, send an email to hpc-support@umich.edu asking to be put on the training queue.
Instructor
Brock Palen
Associate Director
ARC-TS
Brock has over 10 years of high performance computing and data intensive computing experience in an academic environment. He currently works with the team at ARC-TS to provide HPC, Data Science, storage, and other research computing services to the University. Brock also is the NSF XSEDE projects Campus Champion representing the schools to this and other national computing infrastructures and organizations.
Materials
- The presentation is at https://umarcts.bitbucket.io/presentations/hadoop/#/title (Visit our Version Control page for information on using Bit Bucket)
Course Preparation
In order to participate successfully in the class exercises, you must have a Flux user account, an MToken, and be added to a Hadoop queue. The user account allows you to log in to the cluster, create, compile, and test applications, and transfer data into Hadoop’s filesystem for processing. The Hadoop queue allows you to submit those jobs, executing those applications in parallel on the cluster.
Flux user account
A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.
Hadoop queue
We’ll add you to the training queue so you can run jobs on the cluster during the course. If you already have an existing Hadoop queue, you can use that as well, if you like.
Duo Authentication
Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.
If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.
More help
Please email hpc-support@umich.edu for questions, comments, or to seek further assistance.