## Instructor:

*David LeBauer, Ph.D.*

University of Illinois

email:dlebauer@illinois.edu

web: davidlebauer.com

## Course Objectives

A two week course designed to introduce graduate students from the Department of Mathematics to methods in software development, data science, and analysis. The goal is to prepare students to apply their understanding of math to solve problems in industry.

## Requirements

### Code of Conduct

All participants must read and abide by our Code of Conduct.

### Preparation

Please do the following **before class starts**:

- Create an account at github.com
- Fill out the pre-workshop survey
- Signup for the PI4 Slack channel (requires email ending in illinois.edu or hawaii.edu email address)
- Complete “Introduction to R” and “Intermediate R” courses on Data Camp (I will send invitations)

### Expectation: Familarity with basic syntax and operations in R

Although the course is aimed at students with limited experience using software, you are expected to complete two introductory courses in order to become familiar with the basic syntax and operations in R. Two free courses are **Required** before the start of the second day (May 22) of the course.

### Materials: Computers and Software

The only software requirement is a modern web browser. The classroom is equiped with desktop computers, though students are encouraged to bring laptops. Much of the instruction and collaborative work will be done using the NDS Labs Workbench. The NDS labs workbench provides Shell, R, and Python editors as well as access to large datasets and databases.

## Logistics

### Location:

239 Altgeld Hall

University of Illinois Department of Mathematics

1409 West Green Street

Urbana, Il

#### Time: 9AM - 5PM

We will have two fifteen minute breaks and a one hour lunch break from 12-1, each day for lunch.

#### Dates: May 21 – May 26, 2018

- May 21: Scientific Computing Fundamentals
- May 22 – May 25: Data and Statistics in R
- May 26: Conclusion and Project Presentations

#### Daily Schedule:

Time | Activity |
---|---|

9:00–9:30 | Review, questions, overview |

9:30–10:30 | Topic 1 |

10:45–11:00 | Break |

11:00–12:00 | Group Projects |

12:00–1:00 | Lunch |

1:00–2:00 | Topic 2 |

2:00–3:00 | Topic 3 |

3:00–3:15 | Break |

3:15–5:00 | Group Projects |

## Course Schedule

The following schedule is subject to change based on student feedback and interests.

### Day 1: Computing Fundamentals

Monday May 21

- The Terminal SWC The Unix Shell)
- file system navigation
- scripting
- control flow

- Version Control SWC Git Novice 1-6
- commiting changes
- branching
- merging

- Collaborative Coding SWC Git Novice 7-14
- GitHub
- Code Reviews

- Software Development
- Reproducible Research
- Agile / Scrum

- Group Projects: Setup
- Overview of available data
- Overview of scientific questions
- Divide into Teams
- Setup GitHub repository
- Formulate projects

### Day 2: Getting started with R

Tuesday May 22

- Getting Started with R and Rstudio (SWC 1-3)
- Rmarkdown and Reproducible Research
- Loading and Evaluating Data
- data types
- vectorization

- Control Flow (if, else, for) SWC 7
- Visualization SWC 8
- Data Manipulation
- Project
- import, and explore data
- exploratory data analysis
- summarize and next steps

The first half of the day will follow the R Novice Gapminder lesson http://swcarpentry.github.io/r-novice-gapminder/

### Day 3: Databases and Visualization

Wednesday May 31

- Data Cleaning and Exploratory Analysis
- Data Cleaning with Open Refine DC lesson 1-4
- Data Cleaning in R
- Scatter Plots

- Data structures
- Spreadsheets DC lesson
- Relational Databases
- non-relational databases
- Raster data and databases

- Querying databases
- SQL
- Connecting from R using the dplyr package

- Data Curation
- Metadata and Vocabularies
- Publishing Data, Archives and Repositories

- Visualization
- bestiary of plots, which plots for which data
- Turning tables into graphs Gelman et al 2002
- Beyond Bar and line graphs Weissgerber et al 2015
- Tufte, sparklines

- Project:
- more plots
- summary of available data

### Day 4: Probability and Statistics

Thursday June 1

- Probability Distributions
- Bestiary, meaning, PDFs (Bolker Ch4, Dietze EE509)
- Stochastic Simulation (Bolker Ch5)

- Summary statistics
- Estimates of central tendency, variance, shape
- Fitting PDFs -
- parameter estimation
- goodness of fit (
*L*, [A,B,D,]IC)

- Statistical Modeling
- Regression
- Functions
- Dynamic Models

- Projects
- Develop and apply QA/QC metrics
- Make sure reports can be automated

### Day 5

Friday June 2

- Model Building
- Descriptive Analysis
- Hypothesis Driven Analysis

- Model Fitting
- Frequentist, Bayesian
- Inference and Prediction

- Multilevel modeling
- ANOVA (Gelman et al 2005)
- GLM
- HB

### Day 6: Project Wrapup and Presentations

Saturday May 26

- Student-requested topics
- more details from earlier topic (advanced visualization?)
- Shiny apps
- dimensionality reduction and clustering?

- Finish projects
- Presentations