As part of the University’s Data Science Initiative, beginning in the spring of 2016 CSCAR consultants will provide enhanced support for data science tools and methodologies. CSCAR data science consulting supports research in all areas where large and complex data play a role.  We provide guidance on

  • Obtaining licensed data from commercial sources.  We work with the UM Library system to procure such data.  See the library data grants program for funding opportunities.
  • Scraping, aggregating and integrating data from public sources. We can help you assemble datasets of terabyte size and beyond from open sources such as the NCBI, NOAA, the U.S. Census Bureau, Twitter, or any internet site or public digital data source.
  • Analysis of restricted data. We can advise you on infrastructure solutions including ARMIS and the forthcoming Glovebox systems, and external systems such as the U.S. Census Research Data Centers.
  • Data management. CSCAR staff have experience preparing large and complex raw data for analysis using a variety of scripting languages and tools. We can devise solutions using advanced database and data processing tools such as Spark, Hadoop, MongoDB, and Vertiga. CSCAR staff can advise you on record linkage and other methods for preparing unstructured data for analysis. We also have experience with specialized data sources such as data obtained from instruments and sensors, and geospatial, text, image, video, and sound data.
  • High performance data processing. CSCAR staff have expertise in profiling and optimization of code for data processing, including performance profiling tools, optimizing data layouts, concurrency and parallelization, improving disk seek performance and reducing cache misses, use of data compression, and development of high performance code using compiled languages where appropriate.
  • Research methodology for data science. CSCAR staff have expertise in methodological aspects of data science including assessing representativeness of data, adjusting for sampling biases, uncertainty quantification, and sensitivity analyses.
  • Data exploration. CSCAR consultants have experience in data visualization, and methods for conducting hypothesis-generating research with large data.
  • Streaming data. We can provide guidance in methods for streaming data analytics, in which the data cannot be stored in entirety, or when only limited passes through the data are feasible.

For more information, contact cscar@umich.edu.