|
|
|
|
|
|
When we approach modeling as a series of design choices, we highlight the assumptions and subjectivity of value judgements made at each stage and begin to expose the inherent biases embedded within our models.
1 minute, talk to your neighbors:
Authors | Reading | Watching | ||
---|---|---|---|---|
Cathy O'Neil |
|
|||
Viginia Eubanks |
|
Automating Inequality PBS 2018 |
||
Safiya Umoja Noble |
|
|||
Meredith Broussard |
|
|
||
Janelle Shane |
|
The danger of AI is weirder than you think TED 2019 |
||
Hannah Fry |
|
Should Computers Run the World? Royal Institution 2019 |
||
Caroline Criado Perez |
|
Invisible Women Engage 2019 |
||
Ruha Benjamin |
|
Ruha's resources for Race After Tech | ||
Melanie Mitchell |
|
The Collapse of Artificial Intelligence Santa Fe Institute 2019 |
||
Sasha Costanza-Chock |
|
|
||
Kate Crawford |
|
|
||
Catherine D'Ignazio & Lauren F. Klein |
|
"Our success, happiness, and wellbeing are never fully of our own making. Others' decisions can profoundly affect the course of our lives...
Arbitrary, inconsistent, or faulty decision-making thus raises serious concerns..."
- Fairness and Machine Learning, Barocas, Hardt, and Narayanan
When handing over the tools of mathematics,
we are responsible as educators
for teaching their responsible use.
It is a sin of omission when we fail to acknowledge the consequences of the content we teach; Consequences which include ethical and technical pitfalls.
Data |
1. Get the data |
Preprocess |
2. Clean up the data |
Explore |
3. Explore the data |
Model |
4. Model it |
Communicate |
5. Share the results |
Data |
1. Get the data |
Preprocess |
2. Clean up the data |
Explore |
3. Explore the data |
Model |
4. Model it |
Communicate |
5. Share the results |
• Design
∘ Turn a problem into a data-problem.
∘ Survey or experimental design
∘ Database infrastructure
• Acquire
∘ Survey or experiment
∘ Download the dataset! CSV, API, etc.
∘ Web scraping
• Wrangle
∘ Format
∘ Clean and organize
∘ Check data integrity
• Prepare
∘ Label
∘ Split into training and testing sets
∘ Normalize
• Visualize
∘ Plot and familiarize with data
∘ Look for and compare features visually
∘ Consider appropriate models
• Inspect
∘ Exploratory data analysis
∘ Descriptive statistics
∘ Identify features analytically
• Model
∘ Try and compare multiple models
∘ Consider bias and variance
∘ Interpret model and performance
• Validate
∘ Assess model performance on independent test data
∘ Error analysis and stress-test
∘ Consider consequences
• Reflect
∘ Consider contexts, bias, and consequence
∘ Create audit plant
∘ Document - data and model
• Share
∘ Report documentation
∘ Inform policy
∘ Deploy in product
Environment
Data
|
• Harmful data collection, lack of consent, insecure / lack of privacy, historical, representational, or measurement bias, ...
|
Preprocess
|
• Labor exploitation, labeling by non-experts, incorrect labeling, trauma experienced by labelers, ...
|
Explore
|
• Feature selection bias, bias in interpretation of data visualization, data manipulation, feature hacking, ...
|
Model
|
• Bias in model choice, model-amplified bias, environmental impact, learning bias, evaluation bias, peripheral modeling, ...
|
Communicate
|
• Biased model interpretation, ignoring variance, rejecting model, deploying harmful products, deployment bias, ...
|
Meta
|
• "Pernicious feedback loops", runaway homogeneity, susceptability to adversarial attack, lack of oversight or auditing, ...
|
|
|
Data
|
• Data problem: What will be the bounce height \(h_{bounce}\) of my bouncy ball when dropped from rest from a given drop height \(h_{drop}\)?
• Record several slow-motion videos. |
Preprocess
|
• Randomly choose a subset of videos as the training set.
• Parse the training set videos into a table. |
Explore
|
• Create a scatter plot of \(h_{bounce}(h_{drop})\)
• Look for features! Notice and wonder. Consider models. |
Model
|
• Find a best-fit model on the training data.
• Validate the model on the testing data. |
Communicate
|
• Reflect on the process.
• Share out. |
Training Data | Testing Data |