COLUMBIA UNIVERSITY COMS W6998
SYSTEMS FOR HUMAN DATA INTERACTION

Important Dates

Percentages are of your total class grade.

Updates

Overview

The major portion of your grade is based on the research project. It should take about 3-4 weeks to complete.

Teams should consist of 1-3 people. In addition, if you have a project in mind, please indicate briefly (1–2 sentences) what you are thinking. We have included a list of possible projects at the end of this document although you are not required to choose from these.

Good class projects can vary dramatically in complexity, scope, and topic. The only requirement is that they be related to something we have studied in this class and that they contain some element of research – e.g., that you do more than simply engineer a piece of software that someone else has described or architected. To help you determine if your idea is of reasonable scope, we will arrange to meet with each group several times throughout the semester.

Initial Prospectus

Your ultimate research paper will describe the research problem, importance, hypothesis, related works, technical details and evaluation. The prospectus is a sketch to get you to think about these aspects. You will focusn on describing a research problem, and your hypothesis. You will also provide a first pass at related work, a short 2 paragraph description of how you plan to complete the project, and metrics to decide if it worked.

You should meet with Professor Wu prior to deciding your project.

Your prospectus should follow the example:

Submission

  1. Rename the filename of your prospectus to the following format, UNIs should be in alphabetical order. prospectus_<UNI>_.._<UNIn>.pdf
  2. Click here to upload the file by 2/19 11:59PM EST

You will submit an updated version of your prospectus that contains a revised introduction (problem statement, hypothesis), and a substantially fleshed out related work section. It should clearly articulate the novelty of the problem with respect to state-of-the-art. You will need to find and review related literature, and look for software tools that may be related to your problem.

Keep in mind, that Professor Wu will assess this part of the project based on how well it answers the broad question “is there a well-motivated technical challenge and a clear hypothesis of how the problem will be solved?”. This corresponds with Munzner’s threats to validity in her nested models framework. It can be broken down into:

Some helpful tips:

Submission

  1. Rename the filename to the following format, UNIs should be in alphabetical order. DO NOT SUBMIT THE SAME NAME AS YOUR PROSPECTUS!! related_<UNI>_.._<UNIn>.pdf
  2. Click here to upload the file by 3/6 11:59PM EST

Prototype Check in

Your group will schedule 20 minutes to meet with Professor Wu to go over the project’s progress and receive feedback. The first 5 minutes will consist of a short 5 minute presentation with 4 slides (roughly 1 minute per slide). The rest of the time will be an open discussion and fielding questions.

Slides should cover:

  1. Problem and motivation
  2. Related work and challenges
  3. Progress so far
  4. Plan for rest of the project

Submission

Project Showcase

Your team will prepare and present a project poster at the end-of-course showcase session. This gives you an opportunity to present a short presentation demo of your work and show what you have accomplished in the class!

Your presentation should be polished. Since there is still time until the final report, you are encouraged to also discuss ideas or challenges you are still considering.

Since you are presenting to your peers as well, make sure you give your colleagues enough context to understand your ideas. In addition to what you did, help your colleagues understand why you made your specific choices, and provide examples. It’s better to make sure the audience learns a few specific ideas than try to say everything. Come to office hours or contact the staff if you would like feedback.

Overall logitics

Your presentation should cover (in content, not necessarily one slide for each point)

Submission

Report

You will prepare a conference-style report on your project with maximum length of 12 pages (10 pt font or larger, one or two columns, 1 inch margins, single or double spaced – more is not better.) Your report should expand upon your prospectus and introduce and motivate the problem your project addresses, describe related work in the area, discuss the elements of your solution, and present results that measure the behavior, performance, or functionality of your system (with comparisons to other related systems as appropriate.)

Because this report is the primary deliverable upon which you will be graded, do not treat it as an afterthought. Plan to leave at least a week to do the writing, and make sure your proofread and edit carefully!

Submission

  1. Rename the filename to the following format, UNIs should be in alphabetical order. final_<UNI1>_.._<UNIn>.pdf
  2. Click here to upload file by 5/10 11:59PM EST

Project Suggestions

The following are examples of possible projects – they are by no means a complete list and you are free to select your own projects. In fact, a common source of ideas is to take your experience from another domain, and combine it with ideas from human data interaction. Another approach is to take concepts from the papers we read, and apply them to another domain. Projects often come in several flavors:

  1. Research project: model an unsolved problem, propose or extend an algorithmic solution, evaluate and report findings.
  2. Design: identify an underserved data problem for which a sound, composable interface doesn’t exist, propose an interface and interaction design, build and evaluate it.
  3. Fill a gap: think about something useful that should be easily doable, but is painful or impossible with current state of the art. Fill that gap.

New Querying Interfaces

Scalable, Image, Databases are on the horizon. However, a major limitation is that the query interface is incredibly impoverished. How do you specify that you want to find red cars that move along a trajectory? Or to look for relationships between two objects over time? Certainly not by writing SQL-like text queries. The challenge is that video is fundamentally 3D, but query interfaces are 1D.

What We Talk About When We Talk About Data

How are data and analyses referred to and described in scientific work? When data is presented as figures or tables, how is it referred to? What are the verbs and nouns? Is there a universal set of ways that figures are described (e.g., in terms of comparisons? in relative terms? ). This can serve as the evidence for a new data analysis language.
Analyze papers in ArXiV or Viziometrics for their figures and captions and surrounding text (ArXiV provides LateX files)

A Task-oriented Language

Vsualization tools and languages such as Polaris, Vega-lite, and others focul on helping users specify the layout, visual encodings, and implicitly, the grouping and aggregations, of their data. However, choosing the approriate aggregations, layouts, and visual encodings to answer a specific analysis task as quite challenging. For instance, suppose a dataset contains attributes A and B. If the task is “compare A and B”, then at first glance, a scatter plot makes sense. However, what if B only contains the two values “1990” and “2000”? Then, it makes more sense to compare the distributions of A for the years 1990 and 2000. Design a language that makes it easy for users to specify the task, and a compiler that generates the best visual presentation of the data for the task.

Precision Interfaces

Precision interfaces analyzes query logs and generates custom interaction components from the logs. The goal is to scalably generate dozens or hundreds of custom interactive analysis interfaces for any analysis found in a log.

Miscellaneous Ideas

Core Data Processing for Viz

PDFs + tables

Query The Web

Which Optimization Makes Sense?

Run some perceptual studies:

Data file formats

Modalities

Explanation and Cleaning

Recommendations and Predictions