GIS Fundamentals – Spatial Database, PostGIS

By |

PostGIS, built on top of PostgreSQL, is the most powerful open-source relational database for managing spatial data. In this workshop we will cover the basic concept of spatial databases, learn about setting PostGIS, and understand how PostGIS can help us manage large volumes of vector data spread over multiple tables and geometries efficiently.  We will also touch upon topics such as spatial indexing and the capabilities of PostGIS for other data models for 2-D GIS such as the network and raster data model.

GIS Fundamentals – V (Spatial Database – PostGIS)

By |

This is the fifth workshop in a series of workshops we are offering this semester on the fundamentals of GIS. Each workshop covers one or two key elements of GIS and is somewhat self-contained. The focus is on conceptual details that can provide sufficient preparation for applications, but we will also touch upon the technical aspects.

In this workshop we will cover the basic concepts of spatial databases and learn about setting up and using PostGIS, an open source spatial database built on top of PostgreSQL, along with R for vector data analysis. We will also touch upon topics such as spatial indexing, query processing and the capabilities of PostGIS for other data models such as the network and raster data model. This is a hands-on workshop and the instructor will use a Mac machine. If you intend to use a Windows or Linux machine please get in touch with the instructor before the workshop at manishve@umich.edu.

CoreLogic property data

By |
The University of Michigan library system has licensed a large data set containing real estate transactions, deeds, and property tax records for the United States.  The data were collected by the commercial vendor CoreLogic, and our license allows UM researchers to use the data for research purposes.  These data are of potential interest to researchers in many fields, as they capture spatial and temporal real estate market conditions, taxing practices, and the physical states of millions of residential structures in the US.
 
In this workshop, participants will learn to create geographical subsets of the data, seamlessly integrate it in workflow, and see examples of research questions where the data can be useful. Participants should know Python and R.

Interdisciplinary Committee on Organizational Studies (ICOS) Big Data Summer Camp, May 14-18

By | Data, Educational, General Interest, Happenings, News
Social and organizational life are increasingly conducted online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” Within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “using of big data,” and we will all be utilizing terms like API and Python.
This year ICOS, MIDAS, and ARC are again offering a one-week “big data summer camp” for doctoral students interested in organizational research, with a combination of detailed examples from researchers; hands-on instruction in Python, SQL, and APIs; and group work to apply these ideas to organizational questions.  Enrollment is free, but students must commit to attending all day for each day of camp, and be willing to work in interdisciplinary groups.

The dates of the camp are all day May 14th-18th.

U-M partners with Cavium on Big Data computing platform

By | Feature, General Interest, Happenings, HPC, News

A new partnership between the University of Michigan and Cavium Inc., a San Jose-based provider of semiconductor products, will create a powerful new Big Data computing cluster available to all U-M researchers.

The $3.5 million ThunderX computing cluster will enable U-M researchers to, for example, process massive amounts of data generated by remote sensors in distributed manufacturing environments, or by test fleets of automated and connected vehicles.

The cluster will run the Hortonworks Data Platform providing Spark, Hadoop MapReduce and other tools for large-scale data processing.

“U-M scientists are conducting groundbreaking research in Big Data already, in areas like connected and automated transportation, learning analytics, precision medicine and social science. This partnership with Cavium will accelerate the pace of data-driven research and opening up new avenues of inquiry,” said Eric Michielssen, U-M associate vice president for advanced research computing and the Louise Ganiard Johnson Professor of Engineering in the Department of Electrical Engineering and Computer Science.

“I know from experience that U-M researchers are capable of amazing discoveries. Cavium is honored to help break new ground in Big Data research at one of the top universities in the world,” said Cavium founder and CEO Syed Ali, who received a master of science in electrical engineering from U-M in 1981.

Cavium Inc. is a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking. The new U-M system will use dual socket servers powered by Cavium’s ThunderX ARMv8-A workload optimized processors.

The ThunderX product family is Cavium’s 64-bit ARMv8-A server processor for next generation Data Center and Cloud applications, and features high performance custom cores, single and dual socket configurations, high memory bandwidth and large memory capacity.

Alec Gallimore, the Robert J. Vlasic Dean of Engineering at U-M, said the Cavium partnership represents a milestone in the development of the College of Engineering and the university.

“It is clear that the ability to rapidly gain insights into vast amounts of data is key to the next wave of engineering and science breakthroughs. Without a doubt, the Cavium platform will allow our faculty and researchers to harness the power of Big Data, both in the classroom and in their research,” said Gallimore, who is also the Richard F. and Eleanor A. Towner Professor, an Arthur F. Thurnau Professor, and a professor both of aerospace engineering and of applied physics.

Along with applications in fields like manufacturing and transportation, the platform will enable researchers in the social, health and information sciences to more easily mine large, structured and unstructured datasets. This will eventually allow, for example, researchers to discover correlations between health outcomes and disease outbreaks with information derived from socioeconomic, geospatial and environmental data streams.

U-M and Cavium chose to run the cluster on Hortonworks Data Platform, which is based on open source Apache Hadoop. The ThunderX cluster will deliver high performance computer services for the Hadoop analytics and, ultimately, a total of three petabytes of storage space.

“Hortonworks is excited to be a part of forward-leading research at the University of Michigan exploring low-powered, high-performance computing,” said Nadeem Asghar, vice president and global head of technical alliances at Hortonworks. “We see this as a great opportunity to further expand the platform and segment enablement for Hortonworks and the ARM community.”

Big Data: Improving the Scope, Quality and Accessibility of Financial Data

By |

The Office of Financial Research and the University of Michigan will host a joint conference, “Big Data: Improving the Scope, Quality, and Accessibility of Financial Data” in Ann Arbor, Michigan.  The conference will bring together a wide range of scholars, regulators, policymakers, and practitioners to explore how Big Data can be used to enhance financial stability and address other challenges in financial markets.

U-M telecast of XSEDE Big Data workshop

By |

XSEDE and the Pittsburgh Supercomputing Center are presenting a one day Big Data workshop. This workshop will focus on topics such as Hadoop and Spark. U-M is one of several sites around the country that will host a telecast of the session. Registration is required as space is limited.

Schedule:

11:00 Welcome
11:25 Intro to Big Data
11:45 Hadoop
12:15 Hadoop(continued)
1:00 Lunch break
2:00 Exercises
2:45 Spark
3:45 Exercises
4:15 A Big Big Data Platform
5:00 Adjourn