Due: Tuesday 6PM 1/24
During this assignment, you will discover a dataset through three different interaction methods:
The aim is to compare and contrast the different options in terms of ease of use, power, expressiveness, etc.
The data you will work with comes from the US Bureau of Transportations. It contains one large table, with 109 columns and about 520,000 rows. The database describes all the flights that occured on US territory in January 2015, along with delay information. You may find the dataset HERE.
Here are some examples of questions that the database can answer:
For this part of the assignment, you will use a hosted data science service called Instabase and Jupyter.
Create an account on instabase using the following Special Registration URL
When Instabase prompts for a token, use prof-wu-spring2017
Go to the course instabase repository
Select the HW1
folder, click on “more”, then “Copy”, then pick your repository, and copy to Instabase Drive
.
HW1 should now be in the fs/Instabase Drive/
folder of your repository.
For us to understand what you did, please enable logging in your notebook. To do so, go to:
This will log every execution in your notebook. We will release this back to the class as an interesting dataset.
hw1.ipynb
and follow the instructions for the SQL sectionhw1.ipynb
and follow the instructions for the Python sectionFinally, you will perform data exploration with Tableau.
Go to the Tableau Website, and download a demo version of Tableau.
Connect Tableau to the the OnTime database, hosted on a postgreSQL server we set up for the course. To do so, create a new Tableau workbook based on the following server:
Explore the dataset using Tableau. Come up with 3 visualizations that show new insights. Upload their screenshots into your HW1 directory.
IMPORTANT: Copy of your Tableau log file and send them with your results. The log file we are interested in is called “tabprotosrv.txt”.
Please make sure that the logging works before you start engaging in the exploration. To do so, issue a few dummy queries, open the “tabprotosrv.txt” file and make sure that it contains SQL statemements.
You will submit your HW1/
directory
<youruni>.txt
in your HW1/
directory
For each approach, list the insights that you identified from your analysis. One insight per line:
SQL: insight 1
SQL: insight 2
...
Python: insight 1
Python: insight 2
...
Tableau: insight 1
...
HW1
directory. Rename it to <youruni>_tableau.txt
. “Method 3” section of this document describes how to get the logs.HW1
directory.Submit your HW1
directory using this submission link
Read the papers listed on the course website!