Schedule
Please, check out this schedule frequently as it will likely change a bit throughout the quarter.
Legend:
- PA stands for Programming Assignment
- IP stands for Individual Project
- SR stands for Issue Report
- R stands for reading assignment
#Lecture | Date | Lecture | Keywords | Readings | Important Dates |
---|---|---|---|---|---|
1 | 03/30 | Course Overview. Introduction to the process of Data Science | introduction, descriptive statistics, inferential statistics | Reading 1: Solon Barocas and Andrew D. Selbst. Big Data’s Disparate Impact. 104 California Law Review 671, 2016. | IP Proposal assigned SR assigned |
2 | 04/01 | Data: Context and Quality | collection, preparation, cleaning, missing data | ||
3 | 04/06 | Pitfalls in Inferential Statistics | Multiple hypothesis, Bonferroni correction, false discovery rate, statistical vs practical significance | Reading 2.1: danah boyd and Kate Crawford. Critical Questions for Big Data. Information, Communication, and Society. 2012 Reading 2.2: Stephen Stigler. Data Have a Limited Shelf Life. HDSR 2019. Reading 2.3: Michael Jordan Artificial Intelligence: The Revolution Hasn’t Happened Yet. HDSR 2019. | PA1 assigned IP Proposal due R1 due |
4 | 04/08 | The design of experiments and protection of human subjects | human subjects, experimental design, AB testing | ||
5 | 04/13 | More experimental design. Causality | experiments, causality, observational vs experimental data | Reading 3.1: Department of Health, Education, and Welfare. The Belmont Report. April 18, 1979. Reading 3.2: Michelle N. Meyer. Everything You Need to Know About Facebook’s Controversial Emotion Experiment. Wired, June 30, 2014. Reading 3.3: Robinson Meyer. Everything We Know About Facebook’s Secret Mood Manipulation Experiment. The Atlantic, June 28, 2014. | R2 due |
6 | 04/15 | Causation vs Correlation. Causal Inference | pitfalls in communicating results, introduction to causal inference, quasi-experiments | ||
7 | 04/20 | Introduction to Machine Learning | optimization vs generalization, training and test data, models, learning | PA1 due R3 due | |
8 | 04/22 | Machine Learning in the Wild | training data, feature engineering, information leakage, concept drift, algorithmic decision making | PA2 assigned | |
9 | 04/27 | Fairness in Machine Learning | fairness definitions | Reading 4.1: Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner. Machine Bias. ProPublica, May 23, 2016 Reading 4.2: Solon Barocas, Moritz Hardt, Arvind Narayanan. Fairness and Machine Learning Chapter 2: Classification. fairmlbook.org, 2019 | |
10 | 04/29 | Visualization and Communication | packaging data products, reproducibility, repeatibility, visualization, communication | PA3 assigned | |
11 | 05/04 | Philosophy of Privacy | privacy definitions, law, technology | Reading 5.1: Daniel Solove. ‘I’ve Got Nothing to Hide’ and Other Misunderstandings of Privacy. San Diego Law Review 44, 2007. Reading 5.2: Arvind Narayanan and Vitaly Shmatikov. Robust De-anonymization of Large Sparse Datasets. In Proc. IEEE Symposium on Security and Privacy, 2008. | R4 due |
12 | 05/06 | Anonymity | deanonymization | PA2 due | |
13 | 05/11 | Statistical Data Privacy | differential privacy, sensitivity | R5 due | |
14 | 05/13 | Data Lifecycles | provenance, right to be forgotten, data portability | PA4 assigned PA3 due | |
15 | 05/18 | Economics of Data and Externalities 1 | data ownership, value of data, data markets, markets for privacy, data brokers | Reading 6: Economic properties of data and the monopolistic tendencies of data economy: policies to limit an Orwellian possibility. 17 May 2020 by Hoi Wai Jackie Cheng ST/ESA/2020/DWP/164 | |
16 | 05/20 | Economics of Data and Externalities 2. Other topics | data unions, cooperatives, strikes | IP due | |
17 | 05/25 | Project Presentation 1 | PA4 due R6 due | ||
18 | 05/27 | Project Presentation 2 | SR due PA Quiz (24 hour window opens) |