Explore ARCExplore ARC

New private insurance claims dataset and analytic support now available to health care researchers

By | General Interest, Happenings, HPC, News | No Comments

The Institute for Healthcare Policy and Innovation (IHPI) is partnering with Advanced Research Computing (ARC) to bring two commercial claims datasets to campus researchers.

The OptumInsight and Truven Marketscan datasets contain nearly complete insurance claims and other health data on tens of millions of people representing the US private insurance population. Within each dataset, records can be linked longitudinally for over 5 years.  

To begin working with the data, researchers should submit a brief analysis plan for review by IHPI staff, who will create extracts or grant access to primary data as appropriate.

CSCAR consultants are available to provide guidance on computational and analytic methods for a variety of research aims, including use of Flux and other UM computing infrastructure for working with these large and complex repositories.

Contact Patrick Brady (pgbrady@umich.edu) at IHPI or James Henderson (jbhender@umich.edu) at CSCAR for more information.

The data acquisition and availability was funded by IHPI and the U-M Data Science Initiative.

University of Michigan awarded Women in High Performance Computing chapter

By | General Interest, News

The University of Michigan has been recognized as one of the first Chapters in the new Women in High Performance Computing (WHPC) Pilot Program.

“The WHPC Chapter Pilot will enable us to reach an ever-increasing community of women, provide these women with the networks that we recognize are essential for them excelling in their career, and retaining them in the workforce.” says Dr. Sharon Broude Geva, WHPC’s Director of Chapters and Director of Advanced Research Computing (ARC) at the University of Michigan (U-M). “At the same time, we envisage that the new Chapters will be able to tailor their activities to the needs of their local community, as we know that there is no ‘one size fits all’ solution to diversity.”

“At WHPC we are delighted to be accepting the University of Michigan as a Chapter under the pilot program, and working with them to build a sustainable solution to diversifying the international HPC landscape” said Dr. Toni Collis, Chair and co-founder of WHPC, and Chief Business Development Officer at Appentra Solutions.

The process of selecting organizations to participate in the program accounted for potential conflicts of interest; Geva did not vote on U-M’s application.

About Women in High Performance Computing (WHPC) and the Chapters and Affiliates Pilot Program

Women in High Performance Computing (WHPC) was created with the vision to encourage women to participate in the HPC community by providing fellowship, education, and support to women and the organizations that employ them. Through collaboration and networking, WHPC strives to bring together women in HPC and technical computing while encouraging women to engage in outreach activities and improve the visibility of inspirational role models.

WHPC has launched a pilot program for groups to become Affiliates or Chapters. The program will share the knowledge and expertise of WHPC as well as help to tailor activities and develop diversity and inclusion goals suitable to the needs of local HPC communities. During the pilot, WHPC will work with the Chapters and Affiliates to support and promote the work of women in their organizations, develop crucial role models, and assist employers in the recruitment and retention of a diverse and inclusive HPC workforce.

WHPC is stewarded by EPCC at the University of Edinburgh. For more information visit http://www.womeninhpc.org.  

For more information on the U-M chapter, contact Dr. Geva at sgeva@umich.edu.

ARC-TS joins Cloud Native Computing Foundation

By | General Interest, Happenings, News

Advanced Research Computing – Technology Services (ARC-TS) at the University of Michigan has become the first U.S. academic institution to join the Cloud Native Computing Foundation (CNCF), a foundation that advances the development and use of cloud native applications and services. Founded in 2015, CNCF is part of the Linux Foundation.

CNCF announced ARC-TS’s membership at the KubeCon and CloudNativeCon event in Copenhagen. A video of the opening remarks by CNCF Executive Director Dan Kohn can be viewed on the event website.

“Our membership in the CNCF signals our commitment to bringing cloud computing and containers technology to researchers across campus,” said Brock Palen, Director of ARC-TS. “Kubernetes and other CNCF platforms are becoming crucial tools for advanced machine learning, pipelining, and other research methods. We also look forward to bring an academic perspective to the foundation.”

ARC-TS’s membership and participation in the group signals its adoption and commitment to cloud-native technologies and practices. Users of containers and other CNCF services will have access to experts in the field.

Membership gives the U-M research community input into in the continuing development of cloud-native applications, and within CNCF-managed and ancillary projects. U-M is the second academic institution to join the foundation, and the only one in the U.S.

ARC-TS begins work on new “Great Lakes” cluster to replace Flux

By | Flux, Happenings, HPC, News

Advanced Research Computing – Technology Services (ARC-TS) is starting the process of creating a new, campus-wide computing cluster, “Great Lakes,” that will serve the broad needs of researchers across the University. Over time, Great Lakes will replace Flux, the shared research computing cluster that currently serves over 300 research projects and 2,500 active users.

“Researchers will see improved performance, flexibility and reliability associated with newly purchased hardware, as well as changes in policies that will result in greater efficiencies and ease of use,” said Brock Palen, director of ARC-TS.

The Great Lakes cluster will be available to all researchers on campus for simulation, modeling, machine learning, data science, genomics, and more. The platform will provide a balanced combination of computing power, I/O performance, storage capability, and accelerators.

ARC-TS is in the process of procuring the cluster. Only minimal interruption to ongoing research is expected. A “Beta” cluster will be available to help researchers learn the new system before Great Lakes is deployed in the first half of 2019.

The Flux cluster is approximately 8 years old, although many of the individual nodes are newer. One of the benefits of replacing the cluster is to create a more homogeneous platform.

Based on extensive input from faculty and other stakeholders across campus, the new Great Lakes cluster will be designed to deliver similar services and capabilities as Flux, including the ability to accommodate faculty purchases of hardware, access to GPUs and large-memory nodes, and improved support for emerging uses such as machine learning and genomics. The cluster will consist of approximately 20,000 cores.

For more information, contact hpc-support@umich.edu, and see arc-ts.umich.edu/systems-services/greatlakes, where updates to the project will be posted.

The 2018 MICDE Symposium: Summary by Bradley Dice, Ph.D student in Physics and Computational Science

By | Uncategorized

This piece was first published in LinkedIn by Bradley Dice, U-M Ph.D student in Physics and Computational Science.

MICDE Symposium 2018: Computation, A Pillar of Science and a Lens to the Future

High-performance computing (HPC) is becoming an increasingly powerful tool in the hands of scientists, driving new discoveries in physical sciences, life sciences, and social sciences. The development of new (frequently domain-specific) approaches to machine learning and faster, smarter processing of sets of Big Data allows us to explore questions that were previously impossible to study. Yesterday, I presented a poster at the Michigan Institute for Computational Discovery & Engineering (MICDE) annual Symposium and attended a number of talks by researchers working at the intersection of high-performance computing and their domain science. The theme for the symposium was “Computation: A Pillar of Science and a Lens to the Future.”

Collaborative Computational Science with signac

My scientific work, and the work of my colleagues in the Glotzer lab, has been made vastly more efficient through the use of tools for collaborative science, particularly the signac framework. I presented a poster about how the signac framework (composed of open-source Python packages signacsignac-flow, and signac-dashboard) enables scientists to rapidly simulate, model, and analyze data. The name comes from painter Paul Signac, who, along with Georges Seurat, founded the style of pointillism. This neo-impressionist style uses tiny dots of color instead of long brushstrokes, which collectively form a beautiful image when the viewer steps back. This metaphor fits the way that a lot of science works: given only points of data, scientists aim to see the whole picture and tell its story. Since our lab studies materials, our “points” of data fit into a multidimensional parameter space, where quantities like pressure and temperature, or even particles’ shapes, may vary. Using this data, our lab computationally designs novel materials from nanoparticles and studies the physics of complex crystalline structures.

The core signac package, which acts as a database on top of the file system, helps organize and manage scientific data and metadata. Its companion tool signac-flow enables users to quickly define “workflows” that run on supercomputing clusters, determining what operations to perform and submitting the jobs to the cluster for processing. Finally, signac-dashboard (which I develop) provides a web-based data visualization interface that allows users to quickly scan for interesting results and answer scientific questions. These tools include tutorials and documentation, to help users acquaint themselves and get on to doing science as quickly as possible. Importantly, the tools are not specific to materials science. Many scientific fields have similar questions, and the toolkit can easily be applied in fields where exploration or optimization within parameter spaces are common, ranging from fluid mechanics to machine learning.

During the symposium, I learned a lot about how others are using scientific computing in their own work. The symposium speakers came from a wide range of fields, including biology, mathematics, and fluid dynamics. Some of my favorite talks are described below.

The Past: Phylogeny and Uncovering Life’s Origins

High-performance computing is enabling scientists to look in all sorts of directions, including into the past. Stephen Smith, Assistant Professor of Ecology and Evolutionary Biology at the University of Michigan, talked about his lab’s research in detecting evolutionary patterns using genomic data. From the wealth of genetic data that scientists have collected, the Smith lab aims to improve our understanding of the “tree of life”: the overarching phylogenetic tree that can explain the progress of speciation over time. Projects like Open Tree of Life and PHLAWD, an open-source C++ project to process data from the National Center for Biotechnology Information’s GenBank data source, are just two of the ways that open science and big data are informing our understanding of life itself.

The Present: From Algebra to Autonomy

Cleve Moler, the original author of the MATLAB language and chief mathematician, chairman, and cofounder of MathWorks, spoke about his career and how the tools MATLAB has provided for numerical linear algebra (and many other computational tasks) have been important for the development of science and engineering over the last 34 years. MATLAB is taught to STEM students in many undergraduate curricula, and is used widely across industry to simulate and model the behavior of real systems. Features like the Automated System Driving Toolbox are poised to play a role in autonomous vehicles and the difficult computational tasks inherent in their operation.

The Future: Parallel-in-Time Predictions and Meteorology

A significant challenge in weather and climate modeling is that supercomputer architectures are highly parallel, while many simulations of fluids are inherently serial: each timestep must be computed before the next timestep can begin. Beth Wingate, Professor of Mathematics at the University of Exeter and published poet, is developing a powerful approach that may change the way that such models work. Called “parallel-in-time,” it separates the effects of slow dynamics and fast dynamics, enabling parallel architectures to take advantage of longer timesteps and separate the work across many processors.

Conclusions

Computational science is growing rapidly, improving our ability to address the most pressing questions and the mysteries of our world. As new supercomputing resources come online, such as Oak Ridge National Laboratories’ Summit, the promise of exascale computing is coming ever closer to reality. I look forward to what the next year of HPC will bring to our world.

Interdisciplinary Committee on Organizational Studies (ICOS) Big Data Summer Camp, May 14-18

By | Data, Educational, General Interest, Happenings, News
Social and organizational life are increasingly conducted online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” Within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “using of big data,” and we will all be utilizing terms like API and Python.
This year ICOS, MIDAS, and ARC are again offering a one-week “big data summer camp” for doctoral students interested in organizational research, with a combination of detailed examples from researchers; hands-on instruction in Python, SQL, and APIs; and group work to apply these ideas to organizational questions.  Enrollment is free, but students must commit to attending all day for each day of camp, and be willing to work in interdisciplinary groups.

The dates of the camp are all day May 14th-18th.

https://ttc.iss.lsa.umich.edu/ttc/sessions/interdisciplinary-committee-on-organizational-studies-icos-big-data-summer-camp-3/ 

U-M launches Data Science Master’s Program

By | Educational, General Interest, Happenings, News

The University of Michigan’s new, interdisciplinary Data Science Master’s Program is taking applications for its first group of students. The program is aimed at teaching participants how to extract useful knowledge from massive datasets using computational and statistical techniques.

The program is a collaboration between the College of Engineering (EECS), the College of Literature Science and the Arts (Statistics), the School of Public Health (Biostatistics), the School of Information, and the Michigan Institute for Data Science.

“We are very excited to be offering this unique collaborative program, which brings together expertise from four key disciplines at the University in a curriculum that is at the forefront of data science,” said HV Jagadish, Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science, who chairs the program committee for the program.

“MIDAS was a catalyst in bringing  faculty from multiple disciplines together to work towards the development of this new degree program,”  he added.

MIDAS will provide students in this program with interdisciplinary collaborations, intellectual stimulation, exposure to a broad range of practice, networking opportunities, and space on Central Campus to meet for formal and informal gatherings.

For more information, see the program website at https://lsa.umich.edu/stats/masters_students/mastersprograms/data-science-masters-program.html, and the program guide (PDF) at https://lsa.umich.edu/content/dam/stats-assets/StatsPDF/MSDS-Program-Guide.pdf.

Applications are due March 15.

HPC training workshops begin Tuesday, Feb. 13

By | Educational, Events, General Interest, Happenings, HPC, News

series of training workshops in high performance computing will be held Feb. 12 through March 6, 2018, presented by CSCAR in conjunction with Advanced Research Computing – Technology Services (ARC-TS).

Introduction to the Linux command Line
This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also known as the “command line.”
Location: East Hall, Room B254, 530 Church St.
Dates: (Please sign up for only one)
• Tuesday, Feb. 13, 1 – 4 p.m. (full descriptionregistration)
• Friday, Feb. 16, 9 a.m. – noon (full description | registration)

Introduction to the Flux cluster and batch computing
This workshop will provide a brief overview of the components of the Flux cluster, including the resource manager and scheduler, and will offer students hands-on experience.
Location: East Hall, Room B254, 530 Church St.
Dates: (Please sign up for only one)
• Monday, Feb. 19, 1 – 4 p.m. (full description | registration)
• Tuesday, March 6, 1 – 4 p.m. (full description | registration)

Advanced batch computing on the Flux cluster
This course will cover advanced areas of cluster computing on the Flux cluster, including common parallel programming models, dependent and array scheduling, and a brief introduction to scientific computing with Python, among other topics.
Location: East Hall, Room B250, 530 Church St.
Dates: (Please sign up for only one)
• Wednesday, Feb. 21, 1 – 5 p.m. (full description | registration)
• Friday, Feb. 23, 1 – 5 p.m. (full description | registration)

Hadoop and Spark workshop
Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, R, Scala, and Java.
Location: East Hall, Room B250, 530 Church St.
Dates: (Please sign up for only one)
• Thursday, Feb. 22, 1 – 5 p.m. (full description | registration)

U-M fosters thriving artificial intelligence and machine learning research

By | General Interest, HPC, News, Research

Research using machine learning and artificial intelligence — tools that allow computers to learn about and predict outcomes from massive datasets — has been booming at the University of Michigan. The potential societal benefits being explored on campus are numerous, from on-demand transportation systems to self-driving vehicles to individualized medical treatments to improved battery capabilities.

The ability of computers and machines generally to learn from their environments is having transformative effects on a host of industries — including finance, healthcare, manufacturing, and transportation — and could have an economic impact of $15 trillion globally according to one estimate.

But as these methods become more accurate and refined, and as the datasets needed become bigger and bigger, keeping up with the latest developments and identifying and securing the necessary resources — whether that means computing power, data storage services, or software development — can be complicated and time-consuming. And that’s not to mention complying with privacy regulations when medical data is involved.

“Machine learning tools have gotten a lot better in the last 10 years,” said Matthew Johnson-Roberson, Assistant Professor of Engineering in the Department of Naval Architecture & Marine Engineering and the Department of Electrical Engineering and Computer Science. “The field is changing now at such a rapid pace compared to what it used to be. It takes a lot of time and energy to stay current.”

Diagram representing the knowledge graph of an artificial intelligence system, courtesy of Jason Mars, assistant professor, Electrical Engineering and Computer Science, U-M

Johnson-Roberson’s research is focused on getting computers and robots to better recognize and adapt to the world, whether in driverless cars or deep-sea mapping robots.

“The goal in general is to enable robots to operate in more challenging environments with high levels of reliability,” he said.

Johnson-Roberson said that U-M has many of the crucial ingredients for success in this area — a deep pool of talented researchers across many disciplines ready to collaborate, flexible and personalized support, and the availability of computing resources that can handle storing the large datasets and heavy computing load necessary for machine learning.

“The people is one of the reasons I came here,” he said. “There’s a broad and diverse set of talented researchers across the university, and I can interface with lots of other domains, whether it’s the environment, health care, transportation or energy.”

“Access to high powered computing is critical for the computing-intensive tasks, and being able to leverage that is important,” he continued. “One of the challenges is the data. A major driver in machine learning is data, and as the datasets get more and more voluminous, so does the storage needs.”

Yuekai Sun, an assistant professor in the Statistics Department, develops algorithms and other computational tools to help researchers analyze large datasets, for example, in natural language processing. He agreed that being able to work with scientists from many different disciplines is crucial to his research.

“I certainly find the size of Michigan and the inherent diversity that comes with it very stimulating,” he said. “Having people around who are actually working in these application areas helps guide the direction and the questions that you ask.”

Sun is also working on analyzing the potential discriminatory effects of algorithms used in decisions like whether to give someone a loan or to grant prisoners parole.

“If you use machine learning, how do you hold an algorithm or the people who apply it accountable? What does it mean for an algorithm to be fair?” he said. “Can you check whether this notion of non-discrimination is satisfied?”

Jason Mars, an assistant professor in the Electrical Engineering and Computer Science department and co-founder of a successful spinoff called Clinc, is applying artificial intelligence to driverless car technology and a mobile banking app that has been adopted by several large financial institutions. The app, called Finie, provides a much more conversational interface between users and their financial information than other apps in the field.

“There is going to be an expansion of the number of problems solved and number of contributions that are AI-based,” Mars said. He predicted that more researchers at U-M will begin exploring AI and ML as they understand the potential.

“It’s going to require having the right partner, the right experts, the right infrastructure, and the best practices of how to use them,” he said.

He added that U-M does a “phenomenal job” in supporting researchers conducting AI and ML research.

“The level of support and service is awesome here,” he said. “Not to mention that the infrastructure is state of the art. We stay relevant to the best techniques and practices out there.”

Advanced Research Computing at U-M, in part through resources from the university-wide Data Science Initiative, provides computing infrastructure, consulting expertise, and support for interdisciplinary research projects to help scientists conducting artificial intelligence and machine learning research.

For example, Consulting for Statistics, Computing and Analytics Research, an ARC unit, has several consultants on staff with expertise in machine learning and predictive analysis with large, complex, and heterogeneous data. CSCAR recently expanded capacity to support very large-scale machine learning using tools such as Google’s TensorFlow.

CSCAR consultants are available by appointment or on a drop-in basis, free of charge. See cscar.research.umich.edu or email cscar@umich.edu for more information.

CSCAR also provides workshops on topics in machine learning and other areas of data science, including sessions on Machine Learning in Python, and an upcoming workshop in March titled “Machine Learning, Concepts and Applications.”

The computing resources available to machine learning and artificial intelligence researchers are significant and diverse. Along with the campus-wide high performance computing cluster known as Flux, the recently announced Big Data cluster Cavium ThunderX will give researchers a powerful new platform for hosting artificial intelligence and machine learning work. Both clusters are provided by Advanced Research Computing – Technology Services (ARC-TS).

All allocations on ARC-TS clusters include access to software packages that support AI/ML research, including TensorFlow, Torch, and Spark ML, among others.

ARC-TS also operates the Yottabyte Research Cloud (YBRC), a customizable computing platform that recently gained the capacity to host and analyze data governed by the HIPAA federal privacy law.

Also, the Michigan Institute for Data Science (MIDAS) (also a unit of ARC) has supported several AI/ML projects through its Challenge Initiative program, which has awarded more than $10 million in research support since 2015.

For example, the Analytics for Learners as People project is using sensor-based machine learning tools to translate data on academic performance, social media, and survey data into attributes that will form student profiles. Those profiles will help link academic performance and mental health with the personal attributes of students, including values, beliefs, interests, behaviors, background, and emotional state.

Another example is the Reinventing Public Urban Transportation and Mobility project, which is using predictive models based on machine learning to develop on-demand, multi-modal transportation systems for urban areas.

In addition, MIDAS supports student groups involved in this type of research such as the Michigan Student Artificial Intelligence Lab (MSAIL) and the Michigan Data Science Team (MDST).

(A version of this piece appeared in the University Record.)

Video available from MIDAS Research Forum

By | General Interest, Happenings, News, Research

Video is now available from the MIDAS Research Forum held Dec. 1 in the Michigan League at http://myumi.ch/6vA3V

The forum featured U-M students and faculty showcasing their data science research; a workshop on how to work with industry; presentations from student groups; and a summary of the data science consulting and infrastructure services available to the U-M research community.

NOTE: The keynote presentation from Christopher Rozell of the Georgia Institute of Technology will be available in the near future.