Loading Events

Extracting Data from PDFs

October 31 @ 3:00 pm - 4:30 pm

Modern Languages Building (MLB), Room 2001A

Do you have useful information that resides in a data table within a PDF? These are notoriously hard to extract using standard pdf text miner tools. Come explore with us, Tabula, a tool to extract these precious datasets within. Tabula works with text-based PDFs and not scanned documents that might require OCR. The workshop will work through some hands-on example. Tabula can be used with a multitude of languages including Java, Ruby, Node.js, R and Python. The workshop will be conducted in Python using Anaconda Python 3.5 and a Jupyter Notebook. The workshop is intended for participants with some programming background.

 

Details

Date:
October 31
Time:
3:00 pm - 4:30 pm
Event Categories:
,

Other

Class size
27
U-M Affiliated Fee
0
Register
Instructors
Alex Cao