Overview

Human beings rely on summarizing and visualizing data to make informed decisions. The number and volume of data continues to increase at exponential rates, and new user-facing systems and modalities are needed to handle the scale and heterogeneity of future data. This course surveys the landscape interactive data exploration systems along several axes.


Staff+OH Eugene Wu (Instructor) Weds 4-5PM
  Thibault Sellam By Appt
Meetings Weds 2-4PM 503 Hamilton Hall First 1.5 hours paper discussion
Last 30 minutes open discussion.
Units 3  
Grading Questions 10%
  Participation 15%
  Assignments 15%
  Project 60%
  Presentation 0-10% extra credit
  If publishable quality >10-20% extra credit
Communication Piazza Aside from personal questions, use Piazza instead of email.
  Course Github  

Course Expectations

What This Class is NOT

What I expect from You

Assignments

For assignments, you allowed 5 penalty free late days to use throughout the semester. One late day equals one 24 hour period after the due date of the assignment. Once you have used your late days, there will be a 20% penalty for each day an assignment is late. You do not need to explictly declare the use of late days; we will assign them to you in a way that is optimal for your grade when different assignments are worth different numbers of points. Late days may not be used for the final project.

Project (semester long)

You will pursue a semester long research project related to this course. The project is a significant part of the course grade.

Paper Questions (every class)

You are expected to answer the short questions associated with the readings every course. The class reviews must be submitted by 9PM the day before class.

Add your answers to the appropriate lecture’s wiki page

Paper Presentations

You have the option to present as a group (1-2 people) for one lecture on a topic/paper of your choice (within reason). The paper(s) you select can be from the list given below. You are also free to list a paper of your choice as long as it matches the themes of the class. This list must be submitted by midnight Feb 1.

You will be asked to complete three milestones for the presentation. Their purpose is to ensure high presentation quality—it is also a good excuse to practice your presentation skills and get feedback:

  1. 2 weeks before your presentation: present to Professor Wu
  2. 1 week before your presentation: present to two or more classmates and get feedback. The classmates should send me their notes from the presentation.
  3. Day of class: give awesome presentation

Submit the teammates and papers to present

Schedule

Day Presenter Papers Notes/Due
1/18 Eugene Introduction  
1/25 Eugene Specification. Readings + Qs
HW 1
2/01 Eugene Performance overview, end-to-end systems Readings + Qs
Submit presentation requests
Turn in project teams in class!
2/08 Eugene Sampling Readings + Qs
Project Prospectus Due
Stream HW 1 released
2/15 Eugene Prefetching/Network Readings + Qs
Stream HW1 due
Stream HW 2 released
2/22 Gabriel/Daniel Specialized Systems: Macrobase (and BlinkDB?) Readings + Qs
Evaluation functions for Stream HW2 due.
3/01 Alireza/Luren Dremel Readings + Qs
Prediction functions for Stream HW2 due 3/5.
3/08 Eugene Explanation + Midpoint Review Readings + Qs
HW 4 is out
3/15   NO CLASS. Spring Break!  
3/22 Thibault Modalities  
3/29 Brennan/Drashko Recommendation + Summarization  
4/05 Patrick Shafto (guest lecture) TBA  
4/12 Eugene Work on projects in class. Thibault+Wu will help.  
4/19 Thibault Web Tables. Wu (may) be away at ICDE Readings + Qs
HW 5 is out
4/26 Eugene/Thibault Cleaning?  
5/03   Poster Presentation + submit writeups by 5/5  

Topics

Background you should be comfortable with

Classics

Surveys

Specifying Visualizations and Exploration

Declarative Visualization Languages

Interaction Modalities

Augmenting User Exploration

Recommendation

Autocomplete and refinement

Explanation

Garbage in Garbage out

Data Cleaning

Scaling Visual Exploration Systems

End-to-end fast data visualization systems

Data Processing Systems

Prefetching

Sampling

Network

Neat applications