View on GitHub

Ethics, Fairness, Responsibility, and Privacy in Data Science (DATA 25900) at The University of Chicago

259, Spring 24 edition

Schedule

Please, check out this schedule frequently as it will likely change a bit throughout the quarter.

Legend:

PA0 stands for Programming Assignment Primer
PAX stands for Programming Assignment X, where X can take values {1, 2, 3, 4}, e.g., PA3 is the third programming assignment
R stands for reading assignment
P stands for quarter-long project

#Lecture	Date	Lecture	Keywords	Readings	Important Dates
1	03/19	Course Overview and Introduction to the Data Science Process	Data science lifecycle. Ethics, fairness, responsibility, and privacy issues.	Reading 1.1: John P. A. Ioannidis Why Most Published Research Findings Are False PLOS Medicine. 2005 *Reading 1.2: Michael Jordan Artificial Intelligence: The Revolution Hasn’t Happened Yet. HDSR* 2019.	PA0 assigned R1 assigned P assigned
2	03/21	Pitfalls in Inferential Statistics	Multiple hypotheses, Bonferroni correction, false discovery rate, statistical vs practical significance
3	03/26	Data Context and Quality	collection, preparation, cleaning, missing data	*Reading 2.1: Mark D. Wilkinson et al. The FAIR Guiding Principles for scientific data management and stewardship. Nature Scientific Data.* 2016 *Reading 2.2: Stephen Stigler. Data Have a Limited Shelf Life. HDSR* 2019.	R1 due PA0 due R2 assigned PA1 assigned
4	03/28	Causality and Experiments 1/2	causal models, experiments (RCT)
5	04/02	Causality and Experiments 2/2	causal inference from observational data, human subjects, AB testing, experimental design	*Reading 3.1: Department of Health, Education, and Welfare. The Belmont Report. April 18, 1979. Reading 3.2* Robert Bond, Christopher Fariss et al.A 61-million-person experiment in social influence and political mobilization. Nature 2012.	PA1 due R2 due R3 assigned PA2 assigned
6	04/04	IRB (Cheryl Danton)
7	04/09	Discussion 1/3	optimization vs generalization, training and test data, models, learning	Reading 4.1: Nithya Sambasivan et al. Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. CHI 2021. 4.2 Wendy Parker Model Evaluation: An Adequacy-for-Purpose View. 2022 (read the introduction and (optionally) the rest)	R3 due R4 assigned
8	04/11	Machine Learning in the Wild	training data, feature engineering, information leakage, concept drift, algorithmic decision making		PA2 due
9	04/16	Fairness and Interpretability in Machine Learning	fairness definitions	*Reading 5.1* Deirdre K. Mulligan, Joshua A. Kroll, Nitin Kohli, Richmond Y. Wong This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology CSCW 2019 *Reading 5.2* Julia Angwin, Jeff Larson, Surya Mattu, Lauren Kirchner. Machine Bias. ProPublica, May 23, 2016	R4 due R5 assigned PA3 assigned
10	04/18	Visualization and Communication	packaging data products, reproducibility, repeatability, visualization, communication
11	04/23	Discussion 2/3			R5 due
12	04/25	Introduction to Privacy 1/2	privacy definitions, law, technology		PA3 due
13	04/30	Introduction to Privacy 2/2	data anonymization and deanonymization, k-anonimity, attacks, indigenous data sovereignty	*Reading 6: Shoshana Zuboff. Big other: surveillance capitalism and the prospects of an information civilization. Journal of Information Technology* 2015.	R6 assigned
14	05/02	Statistical Data Privacy	differential privacy, sensitivity		PA4 assigned
15	05/07	Data Flows, Lifecyles, Data Markets	provenance, right to be forgotten, data portability, data brokers, data ownership, value of data, data unions, cooperatives, strikes	*Reading 7* . Edith Ramirez, Julie Brill, Maureen K. Ohlhausen, Joshua D. Wright, Terrell McSweeny Data Brokers: A call for transparency and accountability. Federal Trade Commission, May, 2014 (Read Executive Summary and then Section 4 “Types of Products”)	R6 due R7 assigned
16	05/09	Discussion 3/3
17	05/14	~~Summary of the quarter via a case study of LLMs~~ Statistics and the Census Bureau			R7 due PA4 due
18	05/16	AMA
	05/17	No class			P due