Engaging the Web with R

By |

Well known for its statistical capabilities, R can also be used for web-scraping, connecting with websites via APIs, html documents and presentations, interactive visualizations, dashboards, and even building entire, possibly interactive, websites.  This talk will provide an overview of web-based use of aàR. Conceptual introductions, package synopses, and small demonstrations will be presented.

Machine Learning in Python (Scikit-Learn)

By |

This workshop will cover the essentials of unsupervised machine learning algorithms using Python’s Scikit-learn library. We will focus on K-Means and Principal Component Analysis (PCA). The workshop is designed for intermediate to advanced Python users. The session will be held in a computer lab and participants can choose to work on practice exercises either individually or in small groups.

 

Funding available for data set acquisition

By | Funding Opportunities, General Interest, News

The new Data Acquisition for Data Science (DADS) program supports acquisition, preparation, management, and maintenance of specialized research data sets used in current and future data science-enabled research projects across U-M, with special focus on the four challenge initiative areas pursued by the Michigan Institute for Data Science (MIDAS): transportation science, health science, social science, and learning analytics.

DADS is meant to provide datasets that can be used by multiple U-M researchers and departments.

DADS is funded through the Data Science Initiative (DSI); total funding is capped at $200,000 per year for 5 years.

DADS will be managed jointly by the Library and Advanced Research Computing (ARC), with support from ARC’s Consulting for Statistics, Computing, and Analytics Research (CSCAR), MIDAS, and ARC-Technology Services (ARC-TS) units.

For more information, see arc.umich.edu/dads.

Data Science Skills Series Web Scraping with Python

By |

Web Scraping with Python

We will provide an overview of how to scrape data from html pages and website APIs using Python. For demonstration purposes, we will scrape sports and weather data. This will mostly be accomplished using the Python Requests and Beautiful Soup modules.

No registration is necessary.

Data Science Skills Series Week 2

By |

Data processing and visualization in R

This workshop will delve into common data processing and exploration techniques, especially as a prelude to visualization.  The main focus will be the dplyr and data.table packages, which enhance and facilitate the sorts of operations that typically arise when dealing with data, including faster I/O and grouped operations.  For visualization, the focus will be on using ggplot2 and other packages that allow for interactivity.

No registration is necessary.

Data Science Skills Series week 1

By |

Fast data processing with Go

We will demonstrate basic Go using several practical examples of data manipulation.  Go (golang.org) is an open source programming language that has many syntax features of high level languages like Python, but is statically typed, compiled, and has native concurrency support.  For these reasons, Go can be extremely efficient at many complex data processing tasks.  

No registration is necessary.

U-M telecast of XSEDE Big Data workshop

By |

XSEDE and the Pittsburgh Supercomputing Center are presenting a one day Big Data workshop. This workshop will focus on topics such as Hadoop and Spark. U-M is one of several sites around the country that will host a telecast of the session. Registration is required as space is limited.

Schedule:

11:00 Welcome
11:25 Intro to Big Data
11:45 Hadoop
12:15 Hadoop(continued)
1:00 Lunch break
2:00 Exercises
2:45 Spark
3:45 Exercises
4:15 A Big Big Data Platform
5:00 Adjourn

U-M, Coursera offer five-course specialization in Applied Data Science with Python

By | Educational, General Interest, News

Coursera and the University of Michigan are offering a five-course specialization in Applied Data Science with Python starting in September. The courses cost $79 each, and students who complete all coursework, including a capstone project, will receive a Certificate.

The courses, taught by U-M faculty members Christopher Brooks (SI), Kevyn Collins-Thompson (SI and EECS), Daniel Romero (SI and EECS) and VG Vinod Vydiswaran (Medical School and SI), are:

  • Introduction to Data Science in Python
  • Applied Plotting, Charting and Data Representation in Python
  • Applied Machine Learning in Python
  • Applied Text Mining in Python
  • Applied Social Network Analysis in Python (Capstone project)

For more information, see the Coursera webpage.

Building a Community of Social Scientists with Big Data Skills: The ICOS Big Data Summer Camp

By | Educational, Feature, General Interest, News

As the use of data science techniques continues to grow across disciplines, a group of University of Michigan researchers are working to build a community of social scientists with skills in Big Data through a week-long summer camp for faculty and graduate students.

Having recently completed its fourth annual session, the Big Data Summer Camp held by the Interdisciplinary Committee for Organizational Studies (ICOS) trains approximately 50 people each spring in skills and methods such as Python, SQL, and social media APIs. The camp splits up into several groups to try to answer a research question using these newly acquired skills.

Working with researchers from other fields is a key component of the camp, and of creating a Big Data social science community, said co-coordinator Todd Schifeling, a Research Fellow at the Erb Institute in the School of Natural Resources and Environment.

“Students meet from across social science disciplines who wouldn’t meet otherwise,” said Schifeling. “And every year we bring back more and more past campers to present on what they’ve been doing.”

Schifeling himself participated in the camp as a student before taking on the role of coordinator this year.

Teddy DeWitt, the other co-coordinator of the camp and a doctoral student at the Ross School of Business, added the camp presents the curriculum in a unique way relative to the rest of campus.

“This set of material does not seem to be available in other parts of the university, at least … with an applied perspective in mind,” he said. “So we’re glad we have this set of resources that is both accessible and well-received by students.”

Participants range in skill from beginning to advanced, but even a relatively advanced student like Jeff Lockhart, a doctoral student in sociology and population studies who describes himself as “super-committed to computational social science,” said that it’s hard to find classes in computational methods in social science departments.

“[The ICOS camp] doesn’t expect a lot of prior knowledge, which I think is critical,” Lockhart said.

Lockhart, DeWitt, and Dylan Nelson, also a sociology doctoral student, are working on setting up a series of workshops in Computational Social Science for fall 2016 (contact Lockhart at jwlock@umich.edu for more information). Lockhart said it’s critical that social scientists learn Big Data skills.

“If we don’t have skills like this, there’s no way for us to enter into these fields of research that are going to be more and more important,” he said.

“A lot of the skills we’ve learned are sort of the on-ramp for doing data science,” DeWitt added.

The camp is co-sponsored by Advanced Research Computing (ARC).

New on-campus data-science and computational research services available

By | Feature, General Interest, News

Researchers across campus now have access to several new services to help them navigate the new tools and methodologies emerging for data-intensive and computational research.

As part of the U-M Data Science Initiative announced in fall 2015, Consulting for Statistics, Computing and Analytics Research (CSCAR) is offering new and expanded services, including guidance on:

  • Research methodology for data science.
  • Large scale data processing using high performance computing systems.
  • Optimization of code and use of Flux and other advanced computing systems.
  • Advanced data management.
  • Geospatial data analyses.
  • Exploratory analysis and data visualization.
  • Obtaining licensed data from commercial sources.
  • Scraping, aggregating and integrating data from public sources.
  • Analysis of restricted data.

“With Big Data and computational simulations playing an ever-larger role in research in a variety of fields, it’s increasingly important to provide researchers with a comprehensive ecosystem of support and services that address those methodologies,” said CSCAR Director Kerby Shedden.

As part of this significant expansion of its scope, the campuswide statistical consulting service CSCAR has been renamed Consulting for Statistics, Computing and Analytics Research. It was formerly known as the Center for Statistical Consultation and Research.

For more information, see the University Record article.