Introduction to the Flux cluster and batch computing

By |

Overview

This workshop will provide a brief overview of the components of the Flux Cluster. The main body of the workshop will cover the resource manager and scheduler, creating submissions scripts to run jobs and the options available in them, and hands-on experience. By the end of the workshop, every participant should have created a submission script, submitted a job, tracked its progress, and collected its output. Participants will have several working examples from which to build their own submissions scripts in their own home directories.

Prerequisites

Obtain a user account on Flux. If you do not have a Flux user account, click here to go to the account application page at: https://arc-ts.umich.edu/fluxform/

Register for Duo authentication. See below for details.

This course assumes familiarity with the Linux command line as might be got from the CSCAR/ARC-TS workshop Introduction to the Linux Command Line. In particular, participants should understand how files and folders work, be able to create text files using the nano editor, be able to create and remove files and folders, and understand what input and output redirection are and how to use them.

Instructor

Bennet Fauber
Scientific Applications Analyst
Advanced Research Computing — Technology Services

Bennet Fauber is a member of the technical team that administers the Flux cluster.

Materials

In-class handouts

Supplementary materials

Course Preparation

In order to participate successfully in the workshop exercises, you must have a Flux user account and be registered for the Duo authentication service. The user account allows you to log in to the cluster, create, compile, and test applications, and prepare Flux jobs for submission. Information about registering for Duo is at the ITS Two Factor Authentication web page.

Flux user account

A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.

Flux allocation

We create an allocation for the workshop so you can run jobs on the cluster during the workshop and for one day after for those who would like additional practice. The workshop allocation is quite limited and is intended only to run examples to help you cement the details of job submission and management. If you already have an existing Flux allocation, you can use that, though if there are any issues with that allocation, we will ask you to use the workshop allocation.

Laptop Preparation

The classroom contains Windows computers, which require your uniqname and UMICH password to login, and which have all necessary software preloaded.

If you want to use a laptop for the course, you are welcome to do so. However, if there are problems connecting your laptop, you will be asked to switch to the provided computer for the class. We will try to help resolve issues after the workshop.

If you want to try to prepare your laptop for use in the class, please see our web page on Preparing your laptop to use Flux. If you have a Windows computer, we recommend that you install PuTTY/WinSCP from the Blue Disc web site. Macintosh computers typically have the minimum requirements installed as part of the system.

More help

Please use our class mailing list hpc101course@umich.edu for questions, comments, or to seek further assistance.

Sliding into Slurm: An early look at U-M’s new high-performance computing environment

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Sliding into Slurm: An early look at U-M’s new high-performance computing environment

By |

This workshop will provide a brief overview of the the new HPC environment and is intended for current Flux and Armis users.  We will use the temporary Beta HPC cluster to demonstrate how jobs will be submitted and managed under the new Great Lakes, Armis2, and Lighthouse clusters available later this year.

There are many differences between the familiar Flux environment and that of the new HPC clusters, including a new batch scheduling system, a new interactive batch job environment, a new HPC web portal, a new module environment, and a new on-demand-only job accounting system.

We will cover these differences in the workshop, and provide hands-on training in creating and running job submission scripts in the new HPC environment.  Students are expected to be conversant with the Linux command line and have experience in creating, submitting, and troubleshooting PBS batch scripts.

Introduction to the Linux Command Line

By |

This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also generically referred to as “the command line”. Topics include: a brief overview of Linux, the Bash shell, navigating the file system, basic commands, shell redirection, permissions, processes, and the command environment. The workshop will also provide a quick introduction to nano a simple text editor that will be used in subsequent workshops to edit files.

click here to register

Instructor

Kenneth Weiss
IT Project Senior Manager
Medical School Information Services (MSIS)

Ken is a High Performance Computing Consultant in the Computational Research Consulting Division of MSIS at the University of Michigan. He works with a team of IT specialists to provide high performance computing support and training for the Medical School. Prior to this, he spent 21 years managing research computing, including an HPC cluster, for Dr. Charles Sing in the Human Genetics Department.

Materials

Course Preparation

You must register at least three full days prior to the event so that we have time to insure you have proper UM credentials for the workshop. This allows enough time for you to get your account adjusted by ITS in case you do not have access to the Linux systems.

U-M participates in SC18 conference in Dallas

By | General Interest, Happenings, News

University of Michigan researchers and IT staff wrapped up a successful Supercomputing ‘18 (SC18) in Dallas from Nov. 11-16, 2018, taking part in a number of different aspects of the conference.

SC “Perennial” Quentin Stout, U-M professor of Electrical Engineering and Computer Science and one of only 19 people who have been to every Supercomputing conference, co-presented a tutorial titled Parallel Computing 101.

And with the recent announcement of a new HPC cluster on campus called Great Lakes, IT staff from Advanced Research Computing – Technology Services (ARC-TS) made presentations around the conference on the details of the new supercomputer.

U-M once again shared a booth with Michigan State University booth, highlighting our computational and data-intensive research as well as the comprehensive set of tools and services we provide to our researchers. Representatives from all ARC units were at the booth: ARC-TS, the Michigan Institute for Data Science (MIDAS), the Michigan Institute for Computational Discovery and Engineering (MICDE), and Consulting for Statistics, Computing and Analytics Research (CSCAR).

The booth also featured two demonstrations: one on the Open Storage Research Infrastructure or OSiRIS, the multi-institutional software-defined data storage system, and the Services Layer At The Edge (SLATE) project, both of which are supported by the NSF; the other tested conference-goers’ ability to detect “fake news” stories compared to an artificial intelligence system created by researchers supported by MIDAS.

Gallery

U-M Activities

  • Tutorial: Parallel Computing 101: Prof. Stout and Associate Professor Christiane Jablonowski of the U-M Department of Climate and Space Sciences and Engineering provided a comprehensive overview of parallel computing.
  • Introduction to Kubernetes. Presented by Bob Killen, Research Cloud Administrator, and Scott Paschke, Research Cloud Solutions Designer, both from ARC-TS. Containers have shifted the way applications are packaged and delivered. Their use in data science and machine learning is skyrocketing with the beneficial side effect of enabling reproducible research. This rise in use has necessitated the need to explore and adopt better container-centric orchestration tools. Of these tools, Kubernetes – an open-source container platform born within Google — has become the de facto standard. This half-day tutorial introduced researchers and sys admins who may already be familiar with container concepts to the architecture and fundamental concepts of Kubernetes. Attendees explored these concepts through a series of hands-on exercises and left with the leg-up in continuing their container education, and gained a better understanding of how Kubernetes may be used for research applications.
  • Brock Palen, Director of ARC-TS, spoke about the new Great Lakes HPC cluster:
    • DDN booth (3123)
    • Mellanox booth (3207)
    • Dell booth (3218)
    • SLURM booth (1242)
  • Todd Raeker, Research Technology Consultant for ARC-TS, went to the Globus booth (4201) to talk about U-M researchers’ use of the service.
  • Birds of a Feather: Meeting HPC Container Challenges as a Community. Bob Killen, Research Cloud Administrator at ARC-TS, gave a lightning talk as part of this session that presented, prioritized, and gathered input on top issues and budding solutions around containerization of HPC applications.
  • Sharon Broude Geva, Director of ARC, was live on the SC18 News Desk discussing ARC HPC services, Women in HPC, and the Coalition for Scientific Academic Computation (CASC). The stream was available from the Supercomputing Twitter account: https://twitter.com/Supercomputing
  • Birds of a Feather: Ceph Applications in HPC Environments: Ben Meekhof, HPC Storage Administrator at ARC-TS, gave a lightning talk on Ceph and OSiRIS as part of this session. More details at https://www.msi.umn.edu/ceph-hpc-environments-sc18
  • ARC was a sponsor of the Women in HPC Reception. See the event description for more details and to register. Sharon Broude Geva, Director of ARC, gave a presentation.
  • Birds of a Feather: Cloud Infrastructure Solutions to Run HPC Workloads: Bob Killen, Research Cloud Administrator at ARC-TS, presented at this session aimed at architects, administrators, software engineers, and scientists interested in designing and deploying cloud infrastructure solutions such as OpenStack, Docker, Charliecloud, Singularity, Kubernetes, and Mesos.
  • Jing Liu of the Michigan Institute for Data Science, participated in a panel discussion at the Purdue University booth.

Follow ARC on Twitter at https://twitter.com/ARC_UM for updates.

Beta cluster available for learning Slurm; new scheduler to be part of upcoming cluster updates

By | Flux, General Interest, Happenings, HPC, News

New HPC resources to replace Flux and updates to Armis are coming.  They will run a new scheduling system (Slurm). You will need to learn the commands in this system and update your batch files to successfully run jobs. Read on to learn the details and how to get training and adapt your files.

In anticipation of these changes, ARC-TS has created the test cluster “Beta,” which will provide a testing environment for the transition to Slurm. Slurm will be used on Great Lakes; the Armis HIPAA-aligned cluster; and a new cluster called “Lighthouse” which will succeed the Flux Operating Environment in early 2019.

Currently, Flux and Armis use the Torque (PBS) resource manager and the Moab scheduling system; when completed, Great Lakes and Lighthouse will use the Slurm scheduler and resource manager, which will enhance the performance and reliability of the new resources. Armis will transition from Torque to Slurm in early 2019.

The Beta test cluster is available to all Flux users, who can login via ssh at ‘beta.arc-ts.umich.edu’. Beta has its own /home directory, so users will need to create or transfer any files they need, via scp/sftp or Globus.

Slurm commands will be needed to submit jobs. For a comparison of Slurm and Torque commands, see our Torque to Slurm migration page. For more information, see the Beta home page.

Support staff from ARC-TS and individual academic units will conduct several in-person and online training sessions to help users become familiar with Slurm. We have been testing Slurm for several months, and believe the performance gains, user communications, and increased reliability will significantly improve the efficiency and effectiveness of the HPC environment at U-M.

The tentative time frame for replacing or transitioning current ARC-TS resources is:

  • Flux to Great Lakes, first half of 2019
  • Armis from Torque to Slurm, January 2019
  • Flux Operating Environment to Lighthouse, first half of 2019
  • Open OnDemand on Beta, which replaces ARC Connect for web-based job submissions, Jupyter Notebooks, Matlab, and additional software packages, fall 2018

U-M selects Dell EMC, Mellanox and DDN to Supply New “Great Lakes” Computing Cluster

By | Flux, General Interest, Happenings, HPC, News

The University of Michigan has selected Dell EMC as lead vendor to supply its new $4.8 million Great Lakes computing cluster, which will serve researchers across campus. Mellanox Technologies will provide networking solutions, and DDN will supply storage hardware.

Great Lakes will be available to the campus community in the first half of 2019, and over time will replace the Flux supercomputer, which serves more than 2,500 active users at U-M for research ranging from aerospace engineering simulations and molecular dynamics modeling to genomics and cell biology to machine learning and artificial intelligence.

Great Lakes will be the first cluster in the world to use the Mellanox HDR 200 gigabit per second InfiniBand networking solution, enabling faster data transfer speeds and increased application performance.

“High-performance research computing is a critical component of the rich computing ecosystem that supports the university’s core mission,” said Ravi Pendse, U-M’s vice president for information technology and chief information officer. “With Great Lakes, researchers in emerging fields like machine learning and precision health will have access to a higher level of computational power. We’re thrilled to be working with Dell EMC, Mellanox, and DDN; the end result will be improved performance, flexibility, and reliability for U-M researchers.”

“Dell EMC is thrilled to collaborate with the University of Michigan and our technology partners to bring this innovative and powerful system to such a strong community of researchers,” said Thierry Pellegrino, vice president, Dell EMC High Performance Computing. “This Great Lakes cluster will offer an exceptional boost in performance, throughput and response to reduce the time needed for U-M researches to make the next big discovery in a range of disciplines from artificial intelligence to genomics and bioscience.”

The main components of the new cluster are:

  • Dell EMC PowerEdge C6420 compute nodes, PowerEdge R640 high memory nodes, and PowerEdge R740 GPU nodes
  • Mellanox HDR 200Gb/s InfiniBand ConnectX-6 adapters, Quantum switches and LinkX cables, and InfiniBand gateway platforms
  • DDN GRIDScaler® 14KX® and 100 TB of usable IME® (Infinite Memory Engine) memory

“HDR 200G InfiniBand provides the highest data speed and smart In-Network Computing acceleration engines, delivering HPC and AI applications with the best performance, scalability and efficiency,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “We are excited to collaborate with the University of Michigan, Dell EMC and DataDirect Networks, in building a leading HDR 200G InfiniBand-based supercomputer, serving the growing demands of U-M researchers.”

“DDN has a long history of working with Dell EMC and Mellanox to deliver optimized solutions for our customers. We are happy to be a part of the new Great Lakes cluster, supporting its mission of advanced research and computing. Partnering with forward-looking thought leaders as these is always enlightening and enriching,” said Dr. James Coomer, SVP Product Marketing and Benchmarks at DDN.

Great Lakes will provide significant improvement in computing performance over Flux. For example, each compute node will have more cores, higher maximum speed capabilities, and increased memory. The cluster will also have improved internet connectivity and file system performance, as well as NVIDIA Tensor GPU cores, which are very powerful for machine learning compared to prior generations of GPUs.

“Users of Great Lakes will have access to more cores, faster cores, faster memory, faster storage, and a more balanced network,” said Brock Palen, Director of Advanced Research Computing – Technology Services (ARC-TS).

The Flux cluster was created approximately 8 years ago, although many of the individual nodes have been added since then. Great Lakes represents an architectural overhaul that will result in better performance and efficiency. Based on extensive input from faculty and other stakeholders across campus, the new Great Lakes cluster will be designed to deliver similar services and capabilities as Flux, including the ability to accommodate faculty purchases of hardware, access to GPUs and large-memory nodes, and improved support for emerging uses such as machine learning and genomics.

ARC-TS will operate and maintain the cluster once it is built. Allocations of computing resources through ARC-TS include access to hundreds of software titles, as well as support and consulting from professional staff with decades of combined experience in research computing.

Updates on the progress of Great Lakes will be available at https://arc-ts.umich.edu/greatlakes/.

Advanced batch computing on the Flux cluster

By |

Overview

This course will cover some more advanced topics in cluster computing on the U-M Flux Cluster. Topics to be covered include a review of common parallel programming models and basic use of Flux; dependent and array scheduling; advanced troubleshooting and analysis using checkjob, qstat, and other tools; and parallel debugging and profiling of C and Fortran code, including logging, gdb (line-oriented debugging), ddt (GUI-based debugging) and map (GUI-based profiling) of MPI and OpenMP programs. We will issue you a temporary allocation to use for the course, or you can use your existing Flux allocations, if any.

Prerequisites

Introduction to Batch Computing on Flux or equivalent. This course assumes familiarity with the Linux command line, text editing on Linux, and a basic understanding of Flux including how to submit and track jobs.

click here to register

Instructors

Dr. Charles J Antonelli
Advocacy and Research Support
LSA Information Technology

Charles is a High Performance Computing Consultant in the Advocacy and Research Support Group of LSAIT at the University of Michigan, where he is responsible for high performance computing support and education, and is an advocate to the Departments of History and Communications. Prior to this, he built a parallel data ingestion component of a novel earth science data assimilation system, a secure packet vault, and worked on the No. 5 ESS Switch at Bell Labs in the 80s. He has taught courses in operating systems, distributed file systems, C++ programming, security, and database application design.

Mark Champe
Advocacy and Research Support
LSA Information Technology

Materials

Course Preparation

In order to participate successfully in the workshop exercises, you must have a Flux user account, a Flux allocation (one is created for each workshop), and be enrolled in Duo. The user account allows you to log in to the cluster, create, compile, and test applications, and prepare Flux jobs for submission. The allocation allows you to submit those jobs, executing those applications in parallel on the cluster and charging their resource use against the allocation. Duo is required to help authenticate you to the cluster.


Flux user account

A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.


Flux allocation

We create an allocation for the workshop so you can run jobs on the cluster during the workshop and for one day after for those who would like additional practice. The workshop allocation is quite limited and is intended only to run examples to help you cement the details of job submission and management. If you already have an existing Flux allocation, you can use that, though if there are any issues with that allocation, we will ask you to use the workshop allocation.


Duo Authentication

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

Please enroll in Duo before you come to class.

Laptop Preparation

You do not need to bring your own laptop to class. The classroom contains Windows computers, which require your uniqname and UMICH password to login, and that have all necessary software pre-loaded.

If you want to use a laptop for the course, you are welcome to do so:  please see our web page on Preparing your laptop to use Flux. However, if there are problems connecting your laptop, you will be asked to switch to the provided computer for the class. We cannot stop to debug connection issues with personal or departmental laptops during the class.

More help

Please use our class mailing list hpc201course@umich.edu for questions, comments, or to seek further assistance.

Advanced batch computing on the Flux cluster

By |

Overview

This course will cover some more advanced topics in cluster computing on the U-M Flux Cluster. Topics to be covered include a review of common parallel programming models and basic use of Flux; dependent and array scheduling; advanced troubleshooting and analysis using checkjob, qstat, and other tools; and parallel debugging and profiling of C and Fortran code, including logging, gdb (line-oriented debugging), ddt (GUI-based debugging) and map (GUI-based profiling) of MPI and OpenMP programs. We will issue you a temporary allocation to use for the course, or you can use your existing Flux allocations, if any.

Prerequisites

Introduction to Batch Computing on Flux or equivalent. This course assumes familiarity with the Linux command line, text editing on Linux, and a basic understanding of Flux including how to submit and track jobs.

click here to register

Instructors

Dr. Charles J Antonelli
Advocacy and Research Support
LSA Information Technology

Charles is a High Performance Computing Consultant in the Advocacy and Research Support Group of LSAIT at the University of Michigan, where he is responsible for high performance computing support and education, and is an advocate to the Departments of History and Communications. Prior to this, he built a parallel data ingestion component of a novel earth science data assimilation system, a secure packet vault, and worked on the No. 5 ESS Switch at Bell Labs in the 80s. He has taught courses in operating systems, distributed file systems, C++ programming, security, and database application design.

Mark Champe
Advocacy and Research Support
LSA Information Technology

Materials

Course Preparation

In order to participate successfully in the workshop exercises, you must have a Flux user account, a Flux allocation (one is created for each workshop), and be enrolled in Duo. The user account allows you to log in to the cluster, create, compile, and test applications, and prepare Flux jobs for submission. The allocation allows you to submit those jobs, executing those applications in parallel on the cluster and charging their resource use against the allocation. Duo is required to help authenticate you to the cluster.


Flux user account

A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.


Flux allocation

We create an allocation for the workshop so you can run jobs on the cluster during the workshop and for one day after for those who would like additional practice. The workshop allocation is quite limited and is intended only to run examples to help you cement the details of job submission and management. If you already have an existing Flux allocation, you can use that, though if there are any issues with that allocation, we will ask you to use the workshop allocation.


Duo Authentication

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

Please enroll in Duo before you come to class.

Laptop Preparation

You do not need to bring your own laptop to class. The classroom contains Windows computers, which require your uniqname and UMICH password to login, and that have all necessary software pre-loaded.

If you want to use a laptop for the course, you are welcome to do so:  please see our web page on Preparing your laptop to use Flux. However, if there are problems connecting your laptop, you will be asked to switch to the provided computer for the class. We cannot stop to debug connection issues with personal or departmental laptops during the class.

More help

Please use our class mailing list hpc201course@umich.edu for questions, comments, or to seek further assistance.