Schedule
Please, check out this schedule frequently as it will likely change a bit throughout the quarter.
Legend:
- PA0 stands for Programming Assignment Primer
- PAX stands for Programming Assignment X, where X can take values {1, 2, 3, 4}, e.g., PA3 is the third programming assignment
- R stands for reading assignment
- P stands for quarter-long project
#Lecture | Date | Lecture | Keywords | Readings | Important Dates |
---|---|---|---|---|---|
1 | 03/19 | Course Overview and Introduction to the Data Science Process | Data science lifecycle. Ethics, fairness, responsibility, and privacy issues. | Reading 1.1: John P. A. Ioannidis Why Most Published Research Findings Are False PLOS Medicine. 2005 Reading 1.2: Michael Jordan Artificial Intelligence: The Revolution Hasn’t Happened Yet. HDSR 2019. | PA0 assigned R1 assigned P assigned |
2 | 03/21 | Pitfalls in Inferential Statistics | Multiple hypotheses, Bonferroni correction, false discovery rate, statistical vs practical significance | ||
3 | 03/26 | Data Context and Quality | collection, preparation, cleaning, missing data | Reading 2.1: Mark D. Wilkinson et al. The FAIR Guiding Principles for scientific data management and stewardship. Nature Scientific Data. 2016 Reading 2.2: Stephen Stigler. Data Have a Limited Shelf Life. HDSR 2019. | R1 due PA0 due R2 assigned PA1 assigned |
4 | 03/28 | Causality and Experiments 1/2 | causal models, experiments (RCT) | ||
5 | 04/02 | Causality and Experiments 2/2 | causal inference from observational data, human subjects, AB testing, experimental design | Reading 3.1: Department of Health, Education, and Welfare. The Belmont Report. April 18, 1979. Reading 3.2 Robert Bond, Christopher Fariss et al.A 61-million-person experiment in social influence and political mobilization. Nature 2012. | PA1 due R2 due R3 assigned PA2 assigned |
6 | 04/04 | IRB (Cheryl Danton) | |||
7 | 04/09 | Discussion 1/3 | optimization vs generalization, training and test data, models, learning | Reading 4.1: Nithya Sambasivan et al. Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. CHI 2021. 4.2 Wendy Parker Model Evaluation: An Adequacy-for-Purpose View. 2022 (read the introduction and (optionally) the rest) | R3 due R4 assigned |
8 | 04/11 | Machine Learning in the Wild | training data, feature engineering, information leakage, concept drift, algorithmic decision making | PA2 due | |
9 | 04/16 | Fairness and Interpretability in Machine Learning | fairness definitions | Reading 5.1 Deirdre K. Mulligan, Joshua A. Kroll, Nitin Kohli, Richmond Y. Wong This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology CSCW 2019 Reading 5.2 Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner. Machine Bias. ProPublica, May 23, 2016 | R4 due R5 assigned PA3 assigned |
10 | 04/18 | Visualization and Communication | packaging data products, reproducibility, repeatability, visualization, communication | ||
11 | 04/23 | Discussion 2/3 | R5 due | ||
12 | 04/25 | Introduction to Privacy 1/2 | privacy definitions, law, technology | PA3 due | |
13 | 04/30 | Introduction to Privacy 2/2 | data anonymization and deanonymization, k-anonimity, attacks, indigenous data sovereignty | Reading 6: Shoshana Zuboff. Big other: surveillance capitalism and the prospects of an information civilization. Journal of Information Technology 2015. | R6 assigned |
14 | 05/02 | Statistical Data Privacy | differential privacy, sensitivity | PA4 assigned | |
15 | 05/07 | Data Flows, Lifecyles, Data Markets | provenance, right to be forgotten, data portability, data brokers, data ownership, value of data, data unions, cooperatives, strikes | Reading 7 . Edith Ramirez, Julie Brill, Maureen K. Ohlhausen, Joshua D. Wright, Terrell McSweeny Data Brokers: A call for transparency and accountability. Federal Trade Commission, May, 2014 (Read Executive Summary and then Section 4 “Types of Products”) | R6 due R7 assigned |
16 | 05/09 | Discussion 3/3 | |||
17 | 05/14 | R7 due PA4 due | |||
18 | 05/16 | AMA | |||
05/17 | No class | P due |