University of Illinois

October 18-19, 2018

9:00 am - 4:30 pm

Instructors: Ashley Hetrick, Dena Strong, Elizabeth Wickes, Colleen Fallaw

Helpers:

General Information

This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data. The workshop is for anyone who has data they want to analyze, and no prior computational experience is required.

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: 1030 National Center for Computing Applications, . Get directions with OpenStreetMap or Google Maps.

When: October 18-19, 2018. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Acknowledgments: Local Software Carpentry and Data Carpentry workshops are made possible by the generous support of Computational Science and Engineering, Technology Services, the National Center for Supercomputing Applications, HPCBio at the Roy J. Carver Biotechnology Center with support through the Office of the Vice Chancellor for Research, the Deloitte Center for Business Analytics at the Gies College of Business, and the home units of each of our instructors.

Contact: Please email training@cse.illinois.edu for more information.


Schedule

Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

Day 1 · Thurs 10/18

MorningData Organization with Spreadsheets
Afternoon OpenRefine

Day 2 · Fri 10/19

Morning SQL
Afternoon Python

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

Data Organization and Spreadsheets

  • Good data entry practices - formatting data tables in spreadsheets
  • How to avoid common formatting mistakes
  • Approaches for handling dates in spreadsheets
  • Basic quality control and data manipulation in spreadsheets
  • Exporting data from spreadsheets
  • Reference...

OpenRefine

  • What is OpenRefine and why to use it
  • Similarities to and differences from Excel
  • Data cleaning and manipulation
  • Clustering, filtering, faceting, and exploring your data
  • Repeatable actions and version history
  • Exporting your results
  • Reference...

Databases and SQL

  • What is SQL and why to use it
  • Understanding the relationship between spreadsheets and databases
  • Exploring a simple database
  • Designing queries to find results
  • Modifying database contents
  • Reference...

Python

  • What is Python and why to use it
  • Data manipulation with Python
  • File slicing
  • Using loops to repeat actions
  • Reference...

Setup

To participate in a Data Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser and a spreadsheet program such as Microsoft Excel or LibreOffice.

Python

Python is a popular language for research computing, and great for general-purpose programming as well.

We will teach Python using Repl, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

SQLite and SQLiteStudio

SQL is a specialized programming language used with databases. We use a simple database manager called SQLite in our lessons.

SQLite Studio provides a graphical user interface which combines helpful visual displays and point and click interactions with a command line-capable interface to allow you to interact with a database in the manner you prefer. See the SQLite Studio website for installation instructions.

OpenRefine

For this lesson you will need OpenRefine and a web browser. Note: this is a Java program that runs on your machine (not in the cloud). It runs inside a web browser, but no web connection is needed.

Windows

Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.

Download software from http://openrefine.org/

Create a new directory called OpenRefine.

Unzip the downloaded file into the OpenRefine directory by right-clicking and selecting "Extract ...".

Go to your newly created OpenRefine directory.

Launch OpenRefine by clicking google-refine.exe (this will launch a command prompt window, but you can ignore that - just wait for OpenRefine to open in the browser).

If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.

Mac

Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It may not run correctly in Safari.

Download software from http://openrefine.org/.

Create a new directory called OpenRefine.

Unzip the downloaded file into the OpenRefine directory by double-clicking it.

Go to your newly created OpenRefine directory.

Launch OpenRefine by dragging the icon into the Applications folder.

Use Ctrl-click/Open ... to launch it.

If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.

Linux

Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser.

Download software from http://openrefine.org/.

Make a directory called OpenRefine.

Unzip the downloaded file into the OpenRefine directory.

Go to your newly created OpenRefine directory.

Launch OpenRefine by entering ./refine into the terminal within the OpenRefine directory.

If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.