Web Scraping with Python

By |

This workshop will provide an overview of how to scrape data from html pages and website APIs using Python. This will mostly be accomplished using the requests, beautifulsoup, and retry modules with the browser developer tools. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Data Science with Social Science data: an introduction to Pandas and StatsModels in Python

By |

This workshop introduces participants to Python’s NumPy, Pandas DataFrames, Matplotlib and StatsModels using an advertising dataset. Participants will use these tools to model (OLS) associations between advertising expenditures and product sales in example data. We will start with an introductory explanation of Anaconda and the Jupyter notebook environment (although not required for the participant, the instructor will be using these tools). We will proceed with topics including: reading data files; creation, indexing and slicing of Pandas DataFrames; creation and handling of Matplotlib objects; and creation and interpretation of models using Python’s StatsModels. Although not required, we recommend that participants have a basic knowledge of Python.

Pandas Dataframes: Data Processing and Visualization in Python

By |

This workshop will delve into common data processing and exploration techniques. We will use Pandas to perform data exploration in Python. Among others, we’ll demonstrate how to load data files, sort data, group variables, merge/join datasets and create common plots. Although not required, we recommend all participants to have a basic knowledge of Python.

Regular Expressions II

By |

Regular expressions are perfectly suited for people who like puzzles. Regular expressions are a sequence of characters used to define a search pattern. They are commonly used to do “find” and “find and replace” string operations. They are also used to validate strings like phone numbers, passwords, etc. in data entry. Regular expression capabilities can be found in a variety of programming languages and software like ArcGIS, Java, Javascript, Matlab, Perl, PHP, Python, R, Visual Basic, etc. and some text editors. This workshop is part II of a two-part series and will cover more advanced topics like captured groups, backreferences and assertions. The workshop will consist of hands-on example problems. Basic understanding of regular expressions is required. You should be able to understand expressions like “\w{3,}-\d{1,2}-\d{4}“ and “des*ert?s?”. The tutorials will be conducted using Python. A basic programming background is helpful but not required for this workshop.

Mixed models in Python

By |

Mixed models (also known as multilevel models or random effects models) are used in research involving data with repeated measures per observation unit.  The Python statsmodels package is able to fit a variety of mixed models including variance components models, models for longitudinal data, and models with crossed effects.  We will cover the basics of mixed models, then present examples highlighting the capabilities of this Python package.

Data Science with Social Science data: An introduction to Pandas and StatsModels in Python

By |

This workshop introduces participants to Python’s NumPy, Pandas DataFrames, Matplotlib and StatsModels using an advertising dataset. Participants will use these tools to model (OLS) associations between advertising expenditures and product sales in example data. We will start with an introductory explanation of Anaconda and the Jupyter notebook environment (although not required for the participant, the instructor will be using these tools). We will proceed with topics including: reading data files; creation, indexing and slicing of Pandas DataFrames; creation and handling of Matplotlib objects; and creation and interpretation of models using Python’s StatsModels. Although not required, we recommend that participants have a basic knowledge of Python.

Data Processing and Visualization in Python

By |

This workshop will delve into common data processing and exploration techniques. We will use NumPy, Matplotlib and Pandas to perform data exploration in Python. Among others, we’ll demonstrate how to load data files, sort data, group variables, handle missing values, merge/join datasets and create common plots. Although not required, we recommend all participants to have a basic knowledge of Python.

Web Scraping with Python: Twitter and Google APIs

By |

This workshop will cover how to obtain data from Twitter and Google APIs. We’ll cover how to search historical tweets and also obtain data from future tweets using the streaming Twitter API. We will also cover the Google Maps API and APIs related to social media among the wide variety of Google APIs. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Web Scraping with Python: HTML

By |

This workshop will provide an overview of how to scrape data from html pages and website APIs using Python. For demonstration purposes, we will scrape sports and weather data. This will mostly be accomplished using the Python requests, beautifulsoup and retry modules. The workshop is intended for users with basic Python knowledge. Anaconda Python 3.5 will be used.

Data science with social science data

By |

This workshop covers the essential steps to data analysis in Python, using social science data as a case study. The workshop is divided into two parts. The first session includes an introduction to Python’s numpy and Pandas data analysis library. This session requires no previous experience with python. We will cover common steps involved in any data analysis: from loading the data to running a regression and interpreting outcomes.

The second session requires some background knowledge in python provided by the first session. The second session covers more advanced features, from various potential preprocessing steps to using Machine Learning Scikit-learn tools to analyze the data. As in the first session, we will be using an example from the social sciences.

The two sessions will be held in a computer lab and participants will be able to work either individually or in small groups on a few practice exercises.