Explore ARCExplore ARC

Python for Data Analysis

By | | No Comments
This workshop, held over two half-days, will provide participants with an overview of pandas; a Python library for data analysis. The workshop will focus on pandas; a Python library for data analysis. There will be both lecture and hands-on computer work using Python (via Spyder IDE and IPython notebook) on real datasets (crash data will be used). Topics will include how to load in different data formats (csv, excel, access database, etc.), clean, manipulate, process, and crunch data, graph results (using matplotlib and bokeh) and write results to an output file. Computers will be provided but participants are encourage to bring their own laptops and datasets to work with using pandas. Call (734) 764-7828 to register.

 

U-M, Coursera offer five-course specialization in Applied Data Science with Python

By | Educational, General Interest, News

Coursera and the University of Michigan are offering a five-course specialization in Applied Data Science with Python starting in September. The courses cost $79 each, and students who complete all coursework, including a capstone project, will receive a Certificate.

The courses, taught by U-M faculty members Christopher Brooks (SI), Kevyn Collins-Thompson (SI and EECS), Daniel Romero (SI and EECS) and VG Vinod Vydiswaran (Medical School and SI), are:

  • Introduction to Data Science in Python
  • Applied Plotting, Charting and Data Representation in Python
  • Applied Machine Learning in Python
  • Applied Text Mining in Python
  • Applied Social Network Analysis in Python (Capstone project)

For more information, see the Coursera webpage.

Data management and analysis with Python and Pandas

By |

Pandas aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Topics will include how to read various dat formats (csv, excel, databases, etc), clean, manipulate, analyze, graph and write results to an output file. Real world data will be used. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Registration  http://ttc.iss.lsa.umich.edu/ttc/sessions/data-management-and-analysis-with-python-and-pandas/

Geospatial analysis with Python

By |

This workshop will cover basic geospatial analysis in Python. Topics covered will include reading and writing various GIS file formats (shapefile, KML, geojson, csv), geocoding, common geometric operations like finding closest line to a point, point in polygon, spatial indexing and spatial joins etc. The workshop will focus solely on vector data (points, lines, polygons).  The will be mostly accomplished using the Python modules: fiona, shapely, rtree (but not arcpy).

Web Scraping with Python: HTML

By |

This workshop will provide an overview of how to scrape data from html pages and website APIs using Python. For demonstration purposes, we will scrape sports and weather data. This will mostly be accomplished using the Python requests, beautifulsoup and retry modules. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Web Scraping with Python: Twitter and Google APIs

By |

This workshop will cover how to obtain data from Twitter and Google APIs. We’ll cover how to search historical tweets and also obtain data from future tweets using the streaming Twitter API. We will also cover the Google Maps API and APIs related to social media among the wide variety of Google APIs. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Mixed models in Python

By |

Mixed models (also known as multilevel models or random effects models) are used in research involving data with repeated measures per observation unit.  The Python statsmodels package is able to fit a variety of mixed models including variance components models, models for longitudinal data, and models with crossed effects.  We will cover the basics of mixed models, then present examples highlighting the capabilities of this Python package.

Data Processing and Visualization in Python

By |

This workshop will delve into common data processing and exploration techniques. We will use NumPy, Matplotlib and Pandas to perform data exploration in Python. Among others, we’ll demonstrate how to load data files, sort data, group variables, handle missing values, merge/join datasets and create common plots. Although not required, we recommend all participants to have a basic knowledge of Python.

Data science with social science data

By |

This workshop covers the essential steps to data analysis in Python, using social science data as a case study. The workshop is divided into two parts. The first session includes an introduction to Python’s numpy and Pandas data analysis library. This session requires no previous experience with python. We will cover common steps involved in any data analysis: from loading the data to running a regression and interpreting outcomes.

The second session requires some background knowledge in python provided by the first session. The second session covers more advanced features, from various potential preprocessing steps to using Machine Learning Scikit-learn tools to analyze the data. As in the first session, we will be using an example from the social sciences.

The two sessions will be held in a computer lab and participants will be able to work either individually or in small groups on a few practice exercises.

 

Data science with social science data

By |

This workshop covers the essential steps to data analysis in Python, using social science data as a case study. The workshop is divided into two parts. The first session includes an introduction to Python’s numpy and Pandas data analysis library. This session requires no previous experience with python. We will cover common steps involved in any data analysis: from loading the data to running a regression and interpreting outcomes.

The second session requires some background knowledge in python provided by the first session. The second session covers more advanced features, from various potential preprocessing steps to using Machine Learning Scikit-learn tools to analyze the data. As in the first session, we will be using an example from the social sciences.

The two sessions will be held in a computer lab and participants will be able to work either individually or in small groups on a few practice exercises.