Project and content management for Contemporary Authors volumes
WORK TITLE: Weapons of Math Destruction
WORK NOTES:
PSEUDONYM(S): O’Neil, Catherine Helen
BIRTHDATE:
WEBSITE: https://mathbabe.org/
CITY: New York
STATE: NY
COUNTRY:
NATIONALITY:
https://en.wikipedia.org/wiki/Cathy_O’Neil * https://qz.com/819245/data-scientist-cathy-oneil-on-the-cold-destructiveness-of-big-data/ * http://www.latimes.com/books/jacketcopy/la-ca-jc-cathy-oneil-20161229-story.html
RESEARCHER NOTES:
PERSONAL
Married Aise Johan de Jong; children: three sons.
EDUCATION:University of California–Berkeley, B.A.; Harvard University, Ph.D., 1999.
ADDRESS
CAREER
Writer; analyst for Johnson Research Labs, New York, NY, 2013–. Formerly held positions in math departments of MIT and Barnard College; former data analyst for D.E. Shaw; former director of Lede program in Data Practices, Columbia University Graduate School of Journalism, Tow Center.
WRITINGS
SIDELIGHTS
A former math professor who also worked for several years as a hedge fund analyst, Cathy O’Neil left her career in finance after seeing how the faulty application of mathematical data had contributed to the catastrophic collapse of the U.S. real estate bubble in 2008. Her book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, shows how, in her view, mathematical models are used to serve political and economic agendas that exacerbate poverty, inequality, and racism and that diminish individual rights and power.
Weapons of Math Destruction
Though math is neutral in theory, O’Neil acknowledges in Weapons of Math Destruction, it is not so in practice. Algorithms, which are sets of rules or steps for performing a calculation or solving a problem, are often tweaked to produce results that serve the interest of those in power. This was blatantly true in the housing crisis, which the author blames on algorithms that were totally opaque and destructive, and served as tools in a system that, as she commented to London Guardian writer Mona Chalabi, is “rigged for the insiders.”
Math, O’Neil writes, can be used to target the poor, reinforce racist policies, and exacerbate inequality. The algorithms used to produce these results share three key features: they lack clarity and transparency; they are scalable, which means they can be applied to very large situations or systems; and they are unfair. As examples of the application of such algorithms, the author cites loan rates, prison sentencing, school acceptances, and even personality tests used on job applications. Algorithms aggregate data in varying ways, showing, for example, how often residents of a certain zip code default on mortgages or commit crimes. Those in power–whether banks, landlords, employers, or judges–use these algorithms to make decisions that affect individuals. For example, a person with a high credit score and good employment record may be charged a higher mortgage rate merely because of living in neighborhood where a higher than normal percentage of people miss their loan payments. Similar use of algorithms affect people’s access to jobs, schools, health care, and other necessities, and affect people’s interactions with law enforcement and government.
In the book’s final chapter, O’Neil discusses Facebook and its increasing reliance on algorithms to tweak its news feed and deliver targeted political content to users. For reviewer Peter Woit, writing in his blog Not Even Wrong, Facebook’s policies are “perhaps the most worrisome of all the Big Data concerns in the book,” because the site “exercises an incredible amount of influence over what information people see, with this influence sometimes being sold to the highest bidder.”
Commenting on Weapons of Math Destruction in CNN Online, Aimee Rawlins found its discussion of “recidivism models” particularly compelling. These models, which had originated out of an effort to reduce bias in courts’ sentencing of minority convicts, measure data that include prior convictions, home neighborhood, drug and alcohol use, previous interactions with police, and the criminal records of friends and family. Individuals receive a score that is the sum of these circumstances, and this number is used to determine sentencing. As O’Neil makes clear, these criteria are unfair and arbitrary, and would be objected to if identified openly during trial. But instead they are disguised as math, and numbers are seen as superior to human judgment and therefore incontrovertible. As the author explained to Rawlins: “The person is unlikely to know the mix of factors that influenced his or her sentencing–and has absolutely no recourse to contest them.”
In many ways, algorithms exacerbate poverty and racism. Individuals with lower credit ratings are less likely to be hired for higher-paying jobs, and more likely to live in higher-crime areas. Because of the income gap between white households and black ones, African Americans tend to have fewer resources to fall back on in an emergency. Therefore they are more likely to experience a downgrade in their credit rating, with resulting difficulties in future employment and living conditions.
Despite the enormity of the problem in relying on algorithms to manage social issues, O’Neil is optimistic that public awareness has begun to address the misuse of algorithms and to push for protections. She is confident that measures such as the Health Insurance Portability and Accountability (HIPPA) Law will be amended to protect a greater degree of personal data, and that new regulations will guarantee standardized transparency in data modeling. The author also believes that algorithmic models can be used for the public good. Models that predict recidivism, for example, can also be used to identify those inmates most in need of education and counseling. Similarly, zip codes associated with higher rates of crime can be targeted for more community-based policing initiatives to build trust between residents and law enforcement departments.
Weapons of Math Destruction was welcomed as a timely, important, and accessible book. Arthur Goldhammer, writing in the American Prospect, described the book as a “handy map to a few of the many areas of our lives over which invisible algorithms have gained some control.” A contributor to Kirkus Reviews deemed the book “an unusually lucid and readable look at the daunting algorithms that govern so many aspects of our lives.”
Doing Data Science
O’Neil is also the author, with Rachel Schutt, of Doing Data Science: Straight Talk from the Front Line. The book discusses the motivations and background necessary for successful entry into the interdisciplinary field of data science. The book includes case studies that illustrate data scientists’ use of algorithms, methodologies, and models.
John A. Wass, writing for Scientific Computing, found that the book presupposes basic familiarity with much of the subject matter, and focuses more heavily on the use of data science in business rather than in science. Even so, the reviewer said that the book contains “many useful hints, tips and tricks to addressing specific types of problems, as well as pitfalls.” Wass also praised the book’s excellent explanation of algorithms, as well as its often-fascinating asides featuring relevant individuals and anecdotes. Choice reviewer S.-T. Kim hailed Doing Data Science as a “valuable resource” for prospective students.
BIOCRIT
PERIODICALS
American Prospect, winter, 2017, Arthur Goldhammer, review of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, p. 109.
Choice, August, 2014, S.-T. Kim, review of Doing Data Science: Straight Talk from the Front Line, p. 2224.
Diverse Issues in Higher Education, November 17, 2016, Jamaal Abdul-Alim, review of Weapons of Math Destruction, p. 22.
Guardian, October 27, 2016, Mona Chalabi, review of Weapons of Math Destruction and interview with O’Neil.
Kirkus Reviews, August 1, 2016, review of Weapons of Math Destruction.
Library Journal, November 15, 2016, Rachel Bridgewater, review of Weapons f Math Destruction, p. 100.
New York Times, October 3, 2016, Clay Shirky, review of Weapons of Math Destruction.
Publishers Weekly, June 13, 2016, review of Weapons of Math Destruction, p. 87.
Scientific Computing, June, 2014, John A. Wass, review of Doing Data Science, p. 27.
ONLINE
Becoming a Data Scientist, http://www.becomingadatascientist.com/ (March 24, 2017), review of Doing Data Science.
Cathy O’Neil Home Page, https://mathbabe.org (March 24, 2017).
CNN Online, http://money.cnn.com/ (March 24, 2017), Aimee Rawlins, “Math Is Racist: How Data Is Driving Inequality.”
Inside Big Data, http://insidebigdata.com/ (March 24, 2017), Daniel Gutierrez, review of Doing Data Science.
Los Angeles Times Online, http://www.latimes.com/ (March 24, 2017), Christine Zhang, interview with O’Neil.
National Post Online, http://news.nationalpost.com/ (March 24, 2017), Molly Sauter, review of Weapons of Math Destruction.
National Public Radio Web site, http://www.npr.org/ (March 24, 2017), Kelly McEvers, All Things Considered interview with O’Neil.
Not Even Wrong, http://www.math.columbia.edu/ (March 24, 2017), Peter Woit, review of Weapons of Math Destruction.
Quartz, https://qz.com/ (March 24, 2017), Nikhil Sonnad, “Data Scientist Cathy O’Neil on the Cold Destructiveness of Big Data.”
Revolutions, http://blog.revolutionanalytics.com/ (March 24, 2017), Joseph Rickert, review of Doing Data Science.
Scientific American, https://blogs.scientificamerican.com/ (March 24, 2017), Evelyn Lamb, review of Weapons of Math Destruction.*
About
Leave a commentGo to comments
Cathy O’Neil lives in New York City. She hopes to someday have a better answer to the question, “what can a non-academic mathematician do that makes the world a better place?”
If you want to talk to Cathy directly email her at cathy.oneil at gmail dot com.
Q&A Cathy O'Neil, author of 'Weapons of Math Destruction,' on the dark side of big data
Cathy O'Neil
Cathy O'Neil, author of "Weapons of Math Destruction." (Adam Morganstern /)
Christine Zhang
Cathy O’Neil calls herself a data skeptic. A former hedge fund analyst with a PhD in mathematics from Harvard University, the Occupy Wall Street activist left finance after witnessing the damage wrought by faulty math in the wake of the housing crash.
In her latest book, “Weapons of Math Destruction,” O’Neil warns that the statistical models hailed by big data evangelists as the solution to today’s societal problems, like which teachers to fire or which criminals to give longer prison terms, can codify biases and exacerbate inequalities. “Models are opinions embedded in mathematics,” she writes.
Although algorithms are everywhere, the most dangerous ones, according to O’Neil, have three characteristics: scale, secrecy and the capacity to do harm.
Recently reached by phone, O’Neil spoke about the prevalence of these “weapons of math destruction” across different industries. The conversation has been edited for length and clarity.
Promoted stories from TravelChatter.net
'Jewelry ice' in Japan draws photographers and tourists alike
Men share the most fun date a girl has ever taken them on
Daughter of Russian oil tycoon wears $620,000 dress to wedding
When did you first realize that big data could be used to perpetuate inequality?
I found out that the work I was doing on tailored advertising was a mechanism for for-profit colleges to find vulnerable, single black mothers. Find their pain points and promise them a better life if they signed up for online courses, which in the meantime loaded them up with debt and gave them a useless education. I was like, “That’s not helping anyone; that’s making their struggles worse, and it’s happening on my watch because I am the one building the technology for this to work very efficiently.”
What is a new example of a weapon of math destruction?
Recently, I was convinced by Mona Chalabi, who is a journalist at the Guardian but who also spent time at FiveThirtyEight, that political polls are actually weapons of math destruction. They’re very influential; people spend enormous amounts of time on them. They’re relatively opaque. But most importantly, they’re destructive in various ways. In particular, they actually affect people’s voting patterns. The day before the election, if people think their candidate is definitely going to win, then why bother voting? Polls can change people’s actual behavior, which disrupts democracy in a direct way.
People are trying to analyze how demographics shaped the election results. The answer is that we’re probably not going to know or have enough information to make an educated guess until much later.
That’s right. Also, there really were new things about this election cycle that we did not have data on, so we couldn’t account for them. But I’m not suggesting that all we need to do is correct the polls and next time they’ll be more accurate and therefore better. I’m actually trying to make the argument that we should just not do them. I honestly feel like if we had a thought experiment where nobody did polls and nobody talked about polls and we all just talked about the actual issues of the campaign, then we’d have a much better democracy.
In your book, you describe some relatively well-known examples of potentially harmful algorithms, such as value-added models that grade public schoolteachers based on student test scores. You tried to get the source code behind that model from the Department of Education in New York City, but you weren’t able to. Their defense was probably that if people knew how the scores were calculated, then teachers would be able to game the system to get higher scores.
Well, the very teachers whose jobs are on the line don’t understand how they’re being evaluated. I think that’s a question of justice. Everyone should have the right to know how they’re being evaluated at their job. And I should have the right to understand those models as well because I’m a taxpayer, and the job is a government position. The Freedom of Information Act should apply.
Also, if you use the word “gaming,” first you’re implying that there’s a bad actor involved, which sometimes there is. Second, you can really only game a model if it’s weak. The weakness of the teacher value-added model is that it’s statistically terrible. Anybody whose job is on the line deserves to understand that weakness. And deserves to, for that matter, take advantage of it if they can. But my goal isn’t for a bunch of teachers to sneakily get better scores. My goal is for the model itself to be held to high standards.
In some cases, the policymakers themselves probably don’t even know how the scores are calculated.
In the case that I wrote about in my book, nobody in New York City had access to that formula. Nobody. The Department of Education did not know how to explain the scores that they were giving out to teachers.
Los Angeles’s Department of Children and Family Services has been exploring a risk-modeling algorithm called AURA. It was developed by SAS, a private contractor, and it scores children according to their risk of being abused so that social workers can better target their efforts. Something like this could be a weapon of math destruction — it has scale, and the formula is secret — or it could be benign.
Or even positive. It really depends on what exactly they’re doing with those scores. It also depends on how those scores are created. Even if they’re being somewhat punitive, if they’re doing it in a way that has been discussed as morally fair, then that’s probably still OK. If they’re finding kids at risk of child abuse and they’re removing them from families when they have just cause, then we should think of that as a good thing. What would not be OK is if the score was elevated simply because somebody happened to be black or happened to be poor.
So you’re less worried about models that target people in order to help.
It’s tricky because there are different stakeholders. People who are advocating on behalf of the children might be perfectly OK with using questionable attributes that are proxies for race and class that also are proxies for other things that actually put these kids at higher risk. It’s not a clear-cut case, even when there is a punitive result. You have to weigh the possibility of letting a child get abused when you could have prevented it against the possibility of punishing a parent who wasn’t going to abuse their child. That decision could be implemented by the data scientist, but it should not be up to the data scientist to decide the answer to that question.
Should these issues be discussed before the algorithms are deployed?
I want to separate the moral conversations from the implementation of the data model that formalizes those decisions. I want to see algorithms as formal versions of conversations that have already taken place.
In addition to such conversations, you call for the auditing of algorithms, after they have been in use, to see whether they are, in fact, fair. Have you seen this happening in practice?
I just started a business called ORCAA [O'Neil Risk Consulting and Algorithmic Auditing]. My goal is to do the auditing of the algorithms. I would love to help the people who want to use AURA, for example.
Does anything like that exist already?
There have been a few algorithms audited. Notably, ProPublica did an audit of COMPAS, a recidivism risk algorithm. It’s not a full-blown audit. It didn’t go as far as I would have liked in understanding the different stakeholders. Just like there are advocates for the children versus the advocates for the parents in the case of AURA, with COMPAS, there are people — the police — who care about getting the bad guys, and there are people who care about making sure that black men don’t go to prison longer just because they’re black — the civil rights activists. The civil rights activists and the police need to have a conversation where they weigh the chances of letting a man go free who’s going to commit a further crime versus the chance of putting a man in jail for something that he didn’t do and will not do. That same kind of balanced conversation has to take place.
Also, ProPublica is a news organization. You would want to work for the policymakers themselves.
Yes, and I would want to sign an NDA. I wouldn’t write it up. If you’re trying to use an algorithm, I’m going to help you make sure it’s fair. Alternatively, if there’s a class of people who think that they’re unfairly judged by an algorithm and they want me to help them prove that, then I could do that, too.
Have you gotten a lot of demand for these services?
I have zero clients. A bunch of people are thinking about it. Big data is a new field, and people are essentially blindly trusting it. Also, people are still living in the era of plausible deniability. They don’t want to know that their data is racist or whatever, and so far, they’ve been getting away with not knowing. What would make them want to know? Only if there are legal reasons for them to want to know, or reputational risk reasons for them to want to know, or if there’s simply an overwhelming demand by the public.
Have you seen any creative ways in which algorithms are adopted by the people rather than by the powerful?
I’ve seen some small examples. I wrote a post a couple of months ago on [my blog] mathbabe about a college that was using big data to help find kids who would need advising.
What are some examples and characteristics of the opposite of a weapon of math destruction?
My theory is that if it’s scaled and it’s secret, then it had better be obviously not destructive. Or, if it’s scaled and there’s potential for destruction, then you have to make it transparent. If there’s potential, then we need to know more. AURA fits this perfectly. It is potentially destructive, so we need to know that it is not interfering with families that are vulnerable unfairly.
It’s not obvious. Somebody really needs to worry about those kids. I speak as a child who was abused, and I would have loved for there to have been [data-driven] interventions when I was a child. We didn’t even think about that stuff back then.
One defense of algorithms is that they are less biased than humans.
Some people just assume that. They don’t check it.
Is it hard to tell if the algorithm is better?
It’s impossible to guess; you have to actually look at the data. That’s the kind of auditing I want to do. I don’t want to just audit a specific algorithm by itself, I want to audit the algorithm in the context of where it’s being used. And compare it to that same context without the algorithm. Like a meta-audit. Is the criminal justice system better off with recidivism-risk algorithms or without recidivism-risk algorithms? That’s an important question that we don’t know the answer to. We also have the question of whether a specific recidivism-risk algorithm is itself racist. You could find that a specific algorithm is racist, but it’s still better than the status quo.
You’ve worked in academia, finance, advertising, tech, activism and journalism. What advice would you give to socially conscious data scientists in each of these fields?
That’s a hard question. First of all, I’d want them to all take ethics seriously. But it is a challenge. Just taking a job as a data scientist means you’re probably working for a company for whom success means profit. So you might find yourself in the same kind of situation I found myself in, where I wasn’t directly working for a for-profit college, but I was working in an industry that helped those kinds of places survive and flourish.
Maybe data science courses should also teach ethics.
That’s definitely something I’m calling for. Every computer science major and any kind of data science program should have ethics, absolutely.
Data scientist Cathy O’Neil on the cold destructiveness of big data
Nikhil Sonnad December 07, 2016
Data scientist and author Cathy O'Neil
Cathy O'Neil wants to create a better big data future. (Quartz/Johnny Simon)
Big data promises fairness. With enough information about individuals and populations, we can design algorithms that will identify the best possible answer to a given question, free of human bias. Algorithms, after all, are not racist, sexist, or elitist.
Or are they?
In her new book, Weapons of Math Destruction, data scientist Cathy O’Neil discusses example after example of algorithms that make troublingly unfair decisions. Algorithms that—under the guise of math, fairness, and objectivity—reinforce and magnify the old biases and power dynamics that we hoped they would eliminate. A former mathematician and finance quant, O’Neil is now turning her technical skills toward the goal of fulfilling that old promise of fairness through big data. (She is arguably the mathiest member of the Occupy Wall Street movement.)
O’Neil and I spoke at Quartz’s offices in New York. We discussed the decisions algorithms are making about us, what’s wrong with them, and her vision for a better algorithmic future. (This transcript has been edited for concision and clarity.)
Quartz: Your book is partially about algorithmic decisions that people aren’t aware of, or that aren’t expected. What are some decisions being made by algorithms that people might not know about?
O’Neil: All of the time when you’re on the internet. All of the time. I have a couple examples that I like to tell because they affect everyone, and everyone is kind of offended by them in a very direct way.
One of them is, you call up customer service, and from your phone number they infer your value as a customer. If you’re a low-value customer you will wait on hold forever. If you’re a high-value customer you get to a customer representative immediately. And if you’re low-value, you’re likely to be told by the rep, “Oh you’re low-value, you’re not going to get what you want.” That happens. I didn’t even know the rep knows your score, but turns out that in that system they actually do. And they can say, “I’m not going to give you what you want.”
QZ: And so most people assume that what’s happening is just some kind of case-by-case human decision?
O’Neil: They figure that when you call, you just get in line. That’s one example. Another example is, you go to Capital One’s website, they infer based on your browsing information what kind of value as a customer you represent, and they’ll show you a different ad based on what I call your e-credit score. Not your actual credit score, because they don’t have access to that based on your profile information. But they kinda make up an ad-hoc credit score at the moment you arrive, and show you high-interest or low-interest advertisements.
weapons-math-destruction
Cathy O’Neil’s new book.
QZ: So to some extent these decisions are made in secret. That’s one of the three characteristics you identify of a dangerous algorithm, what you call a Weapon of Math Destruction, or WMD. Tell us about these characteristics.
O’Neil: The three that I worry about are scale, secrecy, and destructiveness. It’s not just scale alone. The Netflix algorithm is widespread, but I don’t care about it. Maybe because it’s not destructive, but it’s also just not important. The worst that can happen to a person who gets a bad suggestion on Netflix is that they’re like, “That movie sucked.” So I care about scale and importance. It has to affect a lot of people in an important way. That’s why I talk about things like jobs, prison, and insurance.
The second thing is secrecy. People are basically getting or not getting those things that they need based on scores that they don’t understand and sometimes don’t even know exist. Right there you already have something very dangerous. If you have something that’s that important, and it’s secret, that is already something you can object to.
Those first two things are already red flags that should be under scrutiny by society and by regulators. But in addition to that, I feel like I found examples in the book that are also provably destructive. And when I say destructive, it’s not just destructive for the individual, typically—and this is just an observation, really—it’s not just the individual’s life, but it actually engenders a feedback loop that is destructive. A pernicious feedback loop that often undermines its original goal.
QZ: A striking example of a WMD in the book with this kind of feedback loop is the teacher value-added model, which tries to identify bad teachers by comparing their students’ performance from one year to the next. In this case the model seems to have set up various incentives that kind of contradicted each other, leading to a worse outcome overall.
O’Neil: Michelle Rhee in Washington, DC, was this really gung-ho education reformer. She was hired to apply all of these new reform ideas. She instituted both a bonus for people who got good teacher assessment scores, and for principals at those schools, and she would fire teachers with bad enough scores. What happened, we have reason to believe, is that in DC a lot of teachers just cheated. They like, changed the answers on their students’ scores.
It’s obvious to everyone that if you incentivize something like good standardized test scores, then the teachers are going to teach to the test. But it should also be obvious that if there’s enough carrot and/or enough stick, it’s going to be more extreme than that. You’re going to actually see cheating. And that’s what we saw. We saw a dubious and unusual number of erasures at various schools, including one of the schools that was sending kids to a class taught by Sarah Wysocki. So these kids came into her class with very good scores from the previous year—from a school that had unusual numbers of erasures—but who couldn’t read or write at grade level. So it was very suspicious.
And then she got fired based on her teacher assessment score, and she has reason to believe that the expected score for those kids was unreasonably inflated. So she couldn’t meet those expectations. If you have cheating, then the teacher the year after that can’t possibly keep up. And they are going to get dinged for that other teacher’s cheating. Even if that’s not exactly what happened, if you think about it in terms of incentives, this is what’s going to happen. You’re gonna get cheating. It’s just something we should know by now.
QZ: How should we think about what a “model” or “algorithm” is, generally?
O’Neil: For me, what I mean by this is something that takes in data, and a definition of success, and trains to learn to seek patterns. And then the model becomes a way of predicting. The most important aspect of this thing is not at the technical level; the most important thing is the usage. How is it being marketed? How is it being applied? How are the powers that own it deploying it? The examples I care about are using a model as a sorting mechanism—a scoring system, where people are scored, ranked, and sorted by that score. And then there are serious consequences for a high or a low score.
QZ: How do you think we got to this point where it’s a black box that everyone agrees comes up with a “correct” answer, and that there’s no reason to challenge it?
O’Neil: We have an infrastructure, probably a well-deserved infrastructure in this country, whereby we trust science. Science has done a lot for us. It predicts things like eclipses really well. The sleight of hand that’s happened in the big data era is that we think we can manifestly move that technology onto the human sphere. And we can’t. There are a couple obvious reasons we can’t. Obvious in hindsight.
The number one reason is that when we predict the movement of stars or moons, we don’t change their movements. But when we predict people—and we actually funnel them and channel them into different slots depending on what their score was—or we give them different Facebook feeds depending on what we think they’re going to like, we’re not just predicting what they’re going to like, we’re actually influencing what they’re going to like. So we’re engendering feedback loops. That’s one thing.
The other thing is that we have this belief—which is just wrong—that data itself is inherently objective. That it is somehow created in an objective manner. And in the cases of predictive policing, or recidivism risk algorithms, when the data itself is so completely biased, every single problem of that system follows from the data bias.
We could talk endlessly about what it is we’re doing when we give someone a high risk of recidivism and then send them to prison longer based partly on where they were born rather than what they’ve actually done. But at the end of the day, what we’re talking about is biased data. And it’s biased again because of systemic biases, systemic racism, et cetera. This idea that we’re just going by the numbers, that we’re just following the data, and that the data never lies—that is just wrong.
Cathy O'Neil Q&A at Quartz
Discussing WMDs, not the kind you think. (Quartz/Johnny Simon)
QZ: Let’s look at the case of a hiring manager trying to find an employee. Your book talks about how many such decisions are now being made by algorithms. Should we be comparing the algorithmic decision to some kind of ideal that we’ve thought through ethically and philosophically, or should we be comparing it to what was done previously based on human intuition and interviews?
O’Neil: It’s absolutely a great point that many of these algorithms, although problematic, might actually be improvements from the past. Something like the recidivism risk algorithm—that’s been introduced because judges are nefariously and famously racist. There’s so much evidence that they’re racist. My problem right now is twofold. Number one, these algorithms are also racist. Number two, as far as I know, we haven’t actually measured whether it’s better or worse when you use these scores. And there’s an actual avoidance of answering this question.
So while you’re right that the things that these algorithmic systems are replacing were not perfect, we actually don’t know if it’s getting better or worse. At the very least, let’s see. Let’s test that. Because if we had evidence that judges were racist, we can look for evidence that these new systems are better.
But the other thing I want to say is, the promise of big data, which is not being realized right now, is that we could actually make things better. Plain old better. And we could sit down and have philosophical disagreements and arguments and then sort out some kind of compromise that would involve ideas and theories, and we could implement those using algorithms. And we could have scoring systems that we agree are fair and better than the average stupid human version. But we’re never going to get there if we pretend that what have right now is already perfect.
QZ: One of the things that I’m really freaked out about from reading this book is this idea of algorithms codifying the past. Another point of this question of “is the algorithm better than humans or not” is, well at least in the past, we just knew that society could somehow free itself of racist individuals at some point.
O’Neil: Yeah, they would die out.
QZ: Right. But now we have this model that is thought to be “real” and “objective.”
O’Neil: It’s thought to be a solution to racism, but in fact it’s codifying racism.
QZ: So can you talk about how this process of codifying the past happens?
O’Neil: Yeah. The thought experiment I like to give sometimes is, look at Roger Ailes, if your readers are uncomfortable with the idea of him being guilty as sin of making it impossible for women to succeed as Fox News Anchors. Or just make up any company that has a problem whereby a certain population—say women—are systematically denied success. Not because they’re not good at what they do, but because it’s just a crazy place. And then this happens for 20 years. Now there’s a scandal, Roger Ailes is kicked out, and Fox News vows to do better.
So they introduce a machine-learning algorithm that’s going to replace their hitherto scandalously terrible hiring algorithm. That sounds like good news for people who haven’t read my book, because they’re like, “Oh, machine learning, that’s going to make it fair.” The problem is, if you think about what that means, you’re going to have 20 years of data on people applying to Fox News, and ask, “Ok, five years after that did they get promoted? Did they stay for five years?”
As a data scientist, that’s the kind of thing I’d do. Define success for an applicant as staying there for five years and getting promoted twice. Then I would train my model to look for people that look like these people who succeeded in the past. That’s how machine learning works. And then I would apply that to the current pool of applicants, and I would see that I was systemically removing women. Because women in the past were not allowed to succeed. That for me is a crystalline example of how machine learning codifies the past. As long as you’re training it on historical data that is problematic.
QZ: Right, and we don’t have any other kind of data to train it on.
O’Neil: We don’t. We really don’t.
QZ: Do you think we give humans enough credit for being able to make good decisions?
O’Neil: We as a society or as data scientists?
QZ: I think I’m looking at the societal level. Personally I think, if we have systems that we know are unfair or unequal, and this is known by humans in society, it’s better to just have humans make decisions with the knowledge of this unfairness Versus trying to create some abstract model that ends up codifying these biases forever.
O’Neil: I really think it depends. So, I actually think that one of the things that’s great about humans is that if you had a bunch of human hiring managers, they would have different biases. And one of the problems that I talk about in the book is that, with the algorithm, the same mistake is propagated throughout the system.
Let me give you an example where I really do feel like we are trusting the machines too much. This is when it’s not systematic at all. I feel like, Nate Silver fucked up with Donald Trump for a really important reason, which is that Trump was successful in a way that Nate Silver didn’t recognize. Even when he was polling well, Nate Silver dismissed him, because he said, I’ve never seen this kind of success.
For that same reason—I hope you can jump with me over here—the machine learning algorithm is never going to figure out who’s the next great painter. The algorithm might very well be able to tell you whether a certain new painting or painter is going to be relatively successful, because it looks a lot like successful paintings. But it will never be able to say, this is new, and it’s awesome. And that is fundamentally the job of a human being, to say, this is new, and it’s a big deal.
So that’s an argument from my book, which is that algorithms will never improve us as a society. Because the best they will do is help us create something that we’re comfortable with and that we think of as fair. But as we evolve, they will never keep up with us. We will have to explicitly evolve the computers. We’ll have to train the algorithms. There will always be a lag between even the best, most fair and most principled algorithm and the humans that are leading the way. We are leading the way.
QZ: What can we do to make sure that we’re using math, machine learning, and data science to create the society that we want versus baking in the one that we have right now, or potentially making it worse?
O’Neil: There’s a lot we can do in the direction of making sure the algorithms we’re using are forces for good rather than forces for evil. We haven’t even begun to consider this. We certainly haven’t required algorithms to undergo these “safety checks,” as I call them.
We’re at the stage where we were probably 100 years ago or more, when car companies put out cars that like, had wheels falling off. And we’re just like, “Oh! Ok, you died.” We need to define safety standards, and check on them as we do with things like cars. It’s not a perfect analogy because the truth is, at least everyone notices when you die in a car. The neighbors notice. Whereas with algorithms it’s not as apparent that people have suffered. So maybe a better analogy would be like, we used to let people pollute rivers. But now we have this idea that this is toxic, and you need to keep track of it.
But we’re starting to pay more attention to it. We have no definitions of safety standards, we have not instituted them, and we need to start doing that. The good news is, we can totally do that. We can totally do that. We need to build tools to do that.
Cathy O'Neil
From Wikipedia, the free encyclopedia
Cathy O'Neil
Cathy O'Neil at Google Cambridge.jpg
Cathy O'Neil at Google Cambridge in 2016
Nationality American
Fields Mathematics
Institutions MIT
Barnard College
D.E. Shaw
Columbia University
Alma mater UC Berkeley
Harvard University
Thesis Jacobians of Curves of Genus One
Doctoral advisor Barry Charles Mazur
Website
mathbabe.org]
Catherine ("Cathy") Helen O'Neil is the author of the blog mathbabe.org and several books, including Weapons of Math Destruction. She was the former Director of the Lede Program in Data Practices at Columbia University Graduate School of Journalism, Tow Center and was employed as Data Science Consultant[1] at Johnson Research Labs.
She lives in New York City and is active in the Occupy movement.[2]
Education and academic career[edit]
O'Neil attended UC Berkeley as an undergraduate, [2] received a Ph.D. in mathematics from Harvard University in 1999,[3] and afterward held positions in the mathematics departments of MIT and Barnard College, doing research in arithmetic algebraic geometry.[4] She left academia in 2007, and worked for four years in the finance industry, including two years at the hedge fund D. E. Shaw.[5] After becoming disillusioned with the world of finance, O'Neil became involved with the Occupy Wall Street movement,[2] participating in its Alternative Banking Group.[6]
She is a co-author (with Rachel Schutt) of Doing Data Science: Straight Talk from the Frontline (O'Reilly 2013, ISBN 1449358659). She also wrote an e-book On Being a Data Skeptic (O'Reilly Media 2013, ASIN: B00G3M9JY2). Her book Weapons of Math Destruction was published in 2016 (Crown, ISBN 0553418815) and has been nominated for the 2016 National Book Award for Nonfiction.[7][8]
Personal life[edit]
O'Neil lives in New York City with her husband Aise Johan de Jong and their three sons.[9]
Weapons of Math Destruction: Cathy O'Neil adds up the damage of algorithms
The Harvard PhD and data scientist talks about her new book and ponders how people’s fear and trust of math is akin to worshipping God
Support our fearless, independent journalism with a contribution or by becoming a member
Cathy O’Neil: ‘I left disgusted by finance because I thought of it as a rigged system.’
Cathy O’Neil: ‘I left disgusted by finance because I thought of it as a rigged system.’ Photograph: Adam Morganstern
View more sharing options
Shares
1,012
Comments
147
Mona Chalabi Mona Chalabi
Thursday 27 October 2016 13.08 EDT Last modified on Tuesday 21 February 2017 12.11 EST
“People keep suggesting that democracy is alive and well because we have two parties that don’t agree on everything. I think that’s total bullshit.” When you meet Cathy O’Neil, a data scientist and author, you quickly discover she isn’t exactly convinced about the health of the US’s electoral system.
A Harvard PhD graduate in mathematics and actively involved in the Occupy movement, O’Neil’s experience is crucial to her new book: Weapons of Math Destruction describes the way that math can be manipulated by biases and affect every aspect of our lives.
As well as questioning the two-party system in the US, she’s also looked at how mathematics has been used in the housing and banking sector to affect our lives via her blog mathbabe for more than a decade. So what’s her problem with good old American democracy in 2016?
How algorithms rule our working lives
Read more
“Democracy is more than a two-party system. It’s an informed public and that’s what’s at risk,” she says. “The debates are where you would hope to find out real information, but they’re just talking about their dick size … The algorithms are making it harder and harder to get good information.” And algorithms, rule-based processes for solving mathematical problems, are being applied to more and more areas of our lives.
This idea is at the heart of O’Neil’s thinking on why algorithms can be so harmful. In theory, mathematics is neutral – two plus two equals four regardless of what anyone wishes the answer was. But in practice, mathematical algorithms can be formulated and tweaked based on powerful interests.
O’Neil saw those interests first hand when she was a quantitative analyst on Wall Street. Starting in 2007, O’Neil spent four years in finance, two of them working for a hedge fund. There she saw the use of weapons of math destruction, a term O’Neil uses to describe “algorithms that are important, secret and destructive”. The algorithms that ultimately caused the financial crisis meet all of those criteria – they affected large numbers of people, were entirely opaque and destroyed lives.
I left disgusted by finance because I thought of it as a rigged system and it was rigged for the insiders
“I left disgusted by finance because I thought of it as a rigged system and it was rigged for the insiders,” says O’Neil. “I was ashamed by that – as a mathematician I love math and I think math is a tool for good.”
Sign up to our Bookmarks newsletter
Read more
Among the many examples of powerful formulas that O’Neil cites in her book, political polling doesn’t come up, even though this election cycle has made polling’s power more talked about than ever before. So is it dangerous? Could polling be a weapon of math destruction?
She pauses – “I’m not sure” – then she pauses some more. “I think polling is a weapon of math destruction,” she says. “Nobody really understands it, it’s incredibly widespread and powerful.” We discuss the success of Nate Silver, the founder and editor-in-chief of FiveThirtyEight (a site I spent almost two years working at). Silver has positioned himself as one of the few people who does understand polling, and as such he’s been christened as a soothsayer and savant. We’re desperate for math answers, which is part of the reason we ended up here, according to O’Neil.
“You don’t see a lot of skepticism,” she says. “The algorithms are like shiny new toys that we can’t resist using. We trust them so much that we project meaning on to them.”
That desperation is potentially very damaging to democracy. Increasingly the public is informed about polling data, not policy information, when deciding who to elect. “It’s self-referential,” O’Neil explains.
Like so many algorithms, political polls have a feedback loop – the more we hear a certain candidate is ahead in the polls, the more we recognize their name and the more we see them as electorally viable.
O’Neil’s book explains how other mathematical models do a similar thing – such as the ones used to measure the likelihood an individual will relapse into criminal behavior. When someone is classed as “high risk”, they’re more likely to get a longer sentence and find it harder to find a job when they eventually do get out. That person is then more likely to commit another crime, and so the model looks like it got it right.
And then there are those biases. Contrary to popular opinion that algorithms are purely objective, O’Neil explains in her book that “models are opinions embedded in mathematics”. Think Trump is a hopeless candidate? That will affect your calculations. Think black American men are all criminal thugs? That affects the models being used in the criminal justice system, too.
Is an algorithm any less racist than a human?
Read more
Ultimately algorithms, according to O’Neil, reinforce discrimination and widen inequality, “using people’s fear and trust of mathematics to prevent them from asking questions”. The seemingly contradictory words “fear” and “trust” leap out to me: how many other things do we both fear and trust, except perhaps for fate or God? O’Neil agrees. “I think it has a few hallmarks of worship – we turn off parts of our brain, we somehow feel like it’s not our duty, not our right to question this.”
But sometimes it’s hard for non-statisticians to know which questions to ask. O’Neil’s advice is to be persistent. “People should feel more entitled to push back and ask for evidence, but they seem to fold a little too quickly when they’re told that it’s complicated,” she says. If someone feels that they some formula has affected their lives, “at the very least they should be asking, how do you know that this is legal? That it isn’t discriminatory?’”
But often we don’t even know where to look for those important algorithms, because by definition the most dangerous ones are also the most secretive. That’s why the catalogue of case studies in O’Neil’s book are so important; she’s telling us where to look.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy is out now and published by Crown
No billionaire owner, no shareholders. Just independent, investigative reporting that fights for the truth, whatever the cost. Why not support it? Become a Guardian US member for $49 a year, or make a contribution.
Math is racist: How data is driving inequality
by Aimee Rawlins @aimeerawlins
September 6, 2016: 5:24 PM ET
Wealth: America's other racial divide
It's no surprise that inequality in the U.S. is on the rise. But what you might not know is that math is partly to blame.
In a new book, "Weapons of Math Destruction," Cathy O'Neil details all the ways that math is essentially being used for evil (my word, not hers).
From targeted advertising and insurance to education and policing, O'Neil looks at how algorithms and big data are targeting the poor, reinforcing racism and amplifying inequality.
These "WMDs," as she calls them, have three key features: They are opaque, scalable and unfair.
Denied a job because of a personality test? Too bad -- the algorithm said you wouldn't be a good fit. Charged a higher rate for a loan? Well, people in your zip code tend to be riskier borrowers. Received a harsher prison sentence? Here's the thing: Your friends and family have criminal records too, so you're likely to be a repeat offender. (Spoiler: The people on the receiving end of these messages don't actually get an explanation.)
The models O'Neil writes about all use proxies for what they're actually trying to measure. The police analyze zip codes to deploy officers, employers use credit scores to gauge responsibility, payday lenders assess grammar to determine credit worthiness. But zip codes are also a stand-in for race, credit scores for wealth, and poor grammar for immigrants.
weapons math destruction author
Cathy O'Neil
O'Neil, who has a PhD in mathematics from Harvard, has done stints in academia, at a hedge fund during the financial crisis and as a data scientist at a startup. It was there -- in conjunction with work she was doing with Occupy Wall Street -- that she become disillusioned by how people were using data.
"I worried about the separation between technical models and real people, and about the moral repercussions of that separation," O'Neill writes.
She started blogging -- at mathbabe.org -- about her frustrations, which eventually turned into "Weapons of Math Destruction."
One of the book's most compelling sections is on "recidivism models." For years, criminal sentencing was inconsistent and biased against minorities. So some states started using recidivism models to guide sentencing. These take into account things like prior convictions, where you live, drug and alcohol use, previous police encounters, and criminal records of friends and family.
These scores are then used to determine sentencing.
"This is unjust," O'Neil writes. "Indeed, if a prosecutor attempted to tar a defendant by mentioning his brother's criminal record or the high crime rate in his neighborhood, a decent defense attorney would roar, 'Objection, Your Honor!'"
But in this case, the person is unlikely to know the mix of factors that influenced his or her sentencing -- and has absolutely no recourse to contest them.
Or consider the fact that nearly half of U.S. employers ask potential hires for their credit report, equating a good credit score with responsibility or trustworthiness.
This "creates a dangerous poverty cycle," O'Neil writes. "If you can't get a job because of your credit record, that record will likely get worse, making it even harder to work."
weapons math destruction
This cycle falls along racial lines, she argues, given the wealth gap between black and white households. This means African Americans have less of a cushion to fall back on and are more likely to see their credit slip.
And yet employers see a credit report as data rich and superior to human judgment -- never questioning the assumptions that get baked in.
Related: Milwaukee's staggering black-white economic divide
In a vacuum, these models are bad enough, but O'Neil emphasizes, "they're feeding on each other." Education, job prospects, debt and incarceration are all connected, and the way big data is used makes them more inclined to stay that way.
"Poor people are more likely to have bad credit and live in high-crime neighborhoods, surrounded by other poor people," she writes. "Once ... WMDs digest that data, it showers them with subprime loans or for-profit schools. It sends more police to arrest them and when they're convicted it sentences them to longer terms."
In turn, a new set of WMDs uses this data to charge higher rates for mortgages, loans and insurance.
So, you see, it's easy to be discouraged.
And yet O'Neil is hopeful, because people are starting to pay attention. There's a growing community of lawyers, sociologists and statisticians committed to finding places where data is used for harm and figuring out how to fix it.
She's optimistic that laws like HIPAA and the Americans with Disabilities Act will be modernized to cover and protect more of your personal data, that regulators like the CFPB and FTC will increase their monitoring, and that there will be standardized transparency requirements.
Related: Inequality is widening, even in real estate
And then there's the fact that these models actually have so much potential.
Imagine if you used recidivist models to provide the at-risk inmates with counseling and job training while in prison. Or if police doubled down on foot patrols in high crime zip codes -- working to build relationships with the community instead of arresting people for minor offenses.
You might notice there's a human element to these solutions. Because really that's the key. Algorithms can inform and illuminate and supplement our decisions and policies. But to get not-evil results, humans and data really have to work together.
"Big Data processes codify the past," O'Neil writes. "They do not invent the future. Doing that requires moral imagination, and that's something only humans can provide."
CNNMoney (New York)
First published September 6, 2016: 12:15 PM ET
'Weapons Of Math Destruction' Outlines Dangers Of Relying On Data Analytics
Listen· 7:15
Toggle more options
September 12, 20164:40 PM ET
Heard on All Things Considered
NPR's Kelly McEvers talks with data scientist Cathy O'Neil about her new book, Weapons of Math Destruction, which describes the dangers of relying on big data analytics to solve problems.
KELLY MCEVERS, HOST:
We are in a time of big data. In recent years, NPR's done stories about how data analytics are being used to help political campaigns, rally supporters, compare the cost of similar surgeries in different cities, track public buses in real time and even maybe identify police officers at risk of committing misconduct. But the question is are we putting too much faith in big data? That's the question we're asking in this week's All Tech Considered.
MCEVERS: In her new book, mathematician Cathy O'Neil says we are in a techno utopia. And she does not mean that in a good way. Her book is called "Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy." And she is with us now. Welcome to the show.
CATHY O'NEIL: Honored to be here, Kelly.
MCEVERS: So tell us what you mean by techno utopia.
O'NEIL: Well, techno utopia is this idea that the machine-learning tools, the algorithms, the things that help Google, like, have cars that drive themselves, that these tools are somehow making things objective and fair when, in fact, we really have no idea what's happening to most algorithms under the hood.
MCEVERS: So it sounds like when you're saying, you know, we have these algorithms, but we don't know exactly what they are under the hood, there's this sense that they're inherently unbiased. But what you're saying is that there's all kinds of room for biases.
O'NEIL: Yeah, for example, like, if you imagine, you know, an engineering firm that decided to build a new hiring process for engineers and they say, OK, it's based on historical data that we have on what engineers we've hired in the past and how they've done and whether they've been successful, then you might imagine that the algorithm would exclude women, for example. And the algorithm might do the right thing by excluding women if it's only told just to do what we have done historically. The problem is that when people trust things blindly and when they just apply them blindly, they don't think about cause and effect.
They don't say, oh, I wonder why this algorithm is excluding women, which would go back to the question of, I wonder why women haven't been successful at our firm before? So in some sense, it's really not the algorithm's fault at all. It's, in a large way, the way we apply algorithms and the way we trust them that is the problem.
MCEVERS: Your book has a lot of examples where big data has not lived up to its promise. I was wondering if you could give one example where this happened and, in fact, actually made things even worse?
O'NEIL: Yeah, well, so everybody knows about the sort of decades-long attempt to improve public education in the United States. It goes by various names like No Child Left Behind, you know, Race to the Top. But at the end of the day, what they've decided to do in a large part is to sort of remove these terrible teachers that we keep hearing about. And the way they try to find these terrible teachers is through something called the growth model. And the growth model, mathematically speaking, is pretty weak and has had, like, lots of unintended consequences.
When I say weak, I interviewed a teacher from New York City public schools named Tim Clifford. He's been teaching for 20 years, multiple awards, he's written quite a few books. He got a 6 out of 100 one year and then a 96 out of 100 the next year. And he says his techniques didn't change. So it's very inconsistent. It's not clear what this number is actually scoring in terms of teachers and the teaching ability. I interviewed a woman named Sarah Wysocki in the D.C. area who actually got fired because of her low growth model score.
MCEVERS: There must be other examples, though, where people, you know, good teachers got good scores.
O'NEIL: Yeah, I mean, there certainly are, but I would say it's relatively close to a random number generator. So the fact that some good teachers got good scores doesn't say enough. I guess the point is that you might have some statistical information when you hear a score, but it's not accurate enough to actually decide on whether a given teacher, an individual teacher is doing a good job. But it's treated as such because people just trust numbers, they trust scores.
MCEVERS: When you think about the kinds of problems that people are trying to solve with big data going forward, what are some of the areas where you think, yeah, just don't use data to do that one? That one's too complicated.
O'NEIL: It's such a massive field, like, you absolutely need to perform triage. So I really - I very, very carefully defined the kinds of algorithms that I worry about. And they have three characteristics. The first is that they're high-impact, they affect a lot of people. It's widespread and it's an important decision that the scoring pertains to, so like a job or going to jail, something that's important to people. So it's high-impact. The second one is that the things that worry me the most are opaque. Either that means that the people who get the scores don't understand how they're computed or sometimes that means that they don't even know they're getting scored.
Like if you're online, you don't even know you're scored but you are. And the third characteristic of things that I care about, which I call weapons of math destruction, the third characteristic is that they are actually destructive, that they actually can really screw up somebody's life. And most of the time, these algorithms are created with, like, good intentions in mind. But this destructiveness is typically undermines that good intention and actually creates a destructive feedback loop.
MCEVERS: Flagging you as a potential criminal to police or flagging you as some sort of person in a potential hiring position.
O'NEIL: Exactly. If you just imagine, like, something that is pretty well-known is that credit scores are being used to deny people jobs. And that actually creates worse credit scores. You know, an individual who doesn't get a job because they have a bad credit score goes on to having even worse credit scores. States are trying to prevent that from happening on a state-by-state basis. But what we have now in the age of big data is something called electronic credit scores, E-scores, that a lot of people don't even know they're being made of them.
They're not illegal, they're not regulated. And they could lead to the same kind of drastic and pernicious feedback loops.
MCEVERS: You know, we've talked a lot about the problems and the dangers of big data - want to know some of the things that you think it could be used for in a good way, some of the kinds of problems it can tackle. I mean, this is a powerful thing. Are there smart ways we can use it?
O'NEIL: So in the book, I talk a lot about predictive policing and recidivism risk scoring, which is two different kinds of algorithms that are currently being used to decide who to target, what kind of neighborhoods to target by the police and then, like, whether criminal defendants are high-risk and whether they should go to jail for longer. What I don't see happening, which I wish were happening with respect to the justice system, is a kind of audit of the actual entire process overall. We know a lot about how when people are held in solitary confinement, it's probably not good for them.
So why don't we use data to improve the entire system, including, like, knowing those kinds of attributes? How much time did this person spend in solitary confinement? What were the actual conditions? Was the GED offered as a - you know, what kind of facilities did they have? You know, I'd like to see the Amazon Workflow big data audit system be applied to the justice system. I think a lot could be learned and a lot could be improved.
MCEVERS: That's Cathy O'Neil. She's the author of "Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy." Thank you so much.
O'NEIL: Thank you so much.
(SOUNDBITE OF MUSIC)
O'Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
Rachel Bridgewater
Library Journal. 141.19 (Nov. 15, 2016): p100.
Copyright: COPYRIGHT 2016 Library Journals, LLC. A wholly owned subsidiary of Media Source, Inc. No redistribution permitted.
http://www.libraryjournal.com/
Listen
Full Text:
O'Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown. Sept. 2016. 272p. notes, index. ISBN 9780553418811. $26; ebk. ISBN 9780553418828. POLSCI
As mathematical models affect more and more aspects of our lives, it is crucial to understand that algorithms are not neutral, free from human prejudice and fallibility; instead, those biases and failings are encoded into the systems. Data scientist O'Neil, who blogs at mathbabe.org, explores this premise in depth and chillingly describes the extent to which data-driven, algorithm-based decision making in such areas as hiring, policing, lending, education, and health care actually increases inequality. With barely contained exasperation, O'Neil chronicles the way these "weapons of math destruction"--opaque and unregulated--shape all lives, and, especially, those of the poor. More than just sounding the clarion call to action, O'Neil seeks to empower her readers to ask questions about the algorithms and to demand change. Though the subject matter is alarming and dire, O'Neil's dry wit and ease when describing complicated ideas is more enlivening than depressing. VERDICT This important book will be eye-opening to many readers, possibly even those involved with the kind of models O'Neil criticizes.--Rachel Bridgewater, Portland Community Coll. Lib., OR
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
Publishers Weekly. 263.24 (June 13, 2016): p87.
Copyright: COPYRIGHT 2016 PWxyz, LLC
http://www.publishersweekly.com/
Listen
Full Text:
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
Cathy O'Neil. Crown, $26 (272p) ISBN 978-0553-41881-1
[ILLUSTRATION OMITTED]
This taut and accessible volume, the stuff of technophobes' nightmares, explores the myriad ways in which large-scale data modeling has made the world a less just and equal place. O'Neil speaks from a place of authority on the subject: a Barnard professor turned Wall Street quant, she renounced the latter profession after the 2008 market collapse and decided to educate laypeople. Unlike some other recent books about data collection, hers is not hysterical; she offers more of a chilly wake-up call as she walks readers through the ways the "big data" industry has facilitated social ills such as skyrocketing college tuitions, policing based on racial profiling, and high unemployment rates in vulnerable communities. She also homes in on the ways these systems are frequently destructive even to the privileged: sloppy data-gathering companies misidentify people and flag them as criminals, and algorithms determine employee value during company-wide firings. The final chapter, in which O'Neil discusses Facebook's increasing electoral influence, feels eerily prescient. She offers no one easy solution, but has several reasonable suggestions as to how the future can be made more equitable and transparent for all. Agent: Jay Mandel, William Morris Endeavor. (Sept.)
Doing data science: straight talk from the front line: while having many pluses, this book is not for every budding data analyst
John A. Wass
Scientific Computing. (June 2014): p27.
Copyright: COPYRIGHT 2014 Advantage Business Media
http://www.scientificcomputing.com/
Listen
Full Text:
I can most simply describe this book by quoting from the back cover: Motivation--"... how can you get started in a wide-ranging, interdisciplinary field that's so clouded in hype?" Background Needed--"If you're familiar with linear algebra, probability, and statistics, and have programming experience ..."
However, all this just describes a limited set of skills needed to get started and says nothing about the authors attack on the problem (i.e., learning data science, or at least getting started) and the level of knowledge needed to derive maximal benefit from this book. In that respect, it is a most maddening exercise in bouncing from the overly simple to the overly complex. And it does this rapidly and through most of the technical portions of the book. Let's take a closer look ...
To begin, in reading the first 54 pages, the reader gets the idea that there is a lot of intense introspection on data science and scientists, and that the principal author (with a Ph.D. in statistics) would rather be in the psychology or philosophy department. After that, we get a gradual introduction to linear regression and data distributions in the context of algorithms. This section is nicely done, as it is well-thought-out and of solid pedagogical value. The authors go slowly and explain all of the terminology in concise and straightforward terms. Same goes for the next gradual step, multiple linear regression. A big plus here is the review of assumptions that must be met if the test is actually to be used, a subject of paramount importance that is usually glanced over or buried in many textbooks.
K-Nearest Neighbor algorithms get a similar fine treatment with note taken of scaling problems and explanations of each of the more commonly used distance metrics. However, we soon get hot and heavy into mathematical symbols and expression, as well as R code that can quickly bury the uninitiated. This is typified by the inclusion of texts by Hastie and Tibshirani as well as Casella and Berger in the suggested reading. These are anything but introductory texts and will not be profitably read by many scientists, business types and technicians, some of whom are actually doing data analyses with large data sets or trying to understand reports from others. And the authors do state in the preface: "Don't expect a machine learning textbook. Instead, expect full immersion into the multifaceted aspects of data science from multiple points of view. This is a survey of the existing landscape ..." but it is not an extensive how-to manual.
[ILLUSTRATION OMITTED]
For those new to R, a lot more introduction is needed than merely snippets of code. I find that many authors and even commercial vendors touting the marvels of their software leave out the very first step of actually getting the data into the program. Usually, they have the data set cleaned and prepped and pre-loaded into the program. This can require extra steps in areas where databases need be matched as to data and labels.
There are other, small technical glitches such as Figure 4.1, where the text is way too small and light to read. Also, much in that chapter sounds like it was addressed primarily to the IT department, so their comments above ring true: you already need (what I consider to be advanced) knowledge in statistics and computer programming as well as some domain knowledge in the area of work.
In summarizing this interesting book, it does have many useful hints, tips and tricks to addressing specific types of problems, as well as pitfalls. The hammer and nail story with linear regression is classic! Explanations of algorithms are excellent, and there are also interesting asides on people and the history of algorithms, statistics, etcetera. It also was very nice to see all known versions of key words describing variables and analytic features, which is often quite confusing to the novice. I would appreciate far more scientific examples than the business ones that were in abundance. However, author/contributor backgrounds must be considered.
Interested readers are strongly urged to go to the book's site at Amazon.com and read sections of the scanned-in pages. While having many pluses, this book is not for every budding data analyst.
AVAILABILITY
Doing Data Science: Straight Talk from the Front Line, by Rachel Schutt and Cathy O'Neil. O'Reilly Media, Inc. Sebastopol, CA. pp 406 (2014). $ 39.99. ISBN: 1449358659
John Wass is a statistician based in Chicago, IL. He may be reached at editor@ScientificComputing.com.
Wass, John A.
Hidden in the algorithms: a new book argues that data science may serve to reinforce inequality
Arthur Goldhammer
The American Prospect. 28.1 (Winter 2017): p109.
Copyright: COPYRIGHT 2017 The American Prospect, Inc.
http://www.prospect.org/
Listen
Full Text:
WEAPONS ON MATH DESTRUCTION: HOW BIG DATA INCREASES INEQUALITY AND THREATENS DEMOCRACY
BY CATHY O'NEIL
Crown
In her catchily titled book, Weapons of Math Destruction, Cathy O'Neil, a number theorist turned data scientist, delivers a simple but important message: Statistical models are everywhere, and they exert increasing power over many aspects of our daily lives. Data collected by occult means and analyzed by algorithms of often dubious validity help to determine who gets a mortgage, who goes to college, what you pay for insurance, who gets what job, what level of scrutiny you will be subjected to when you fly, how aggressively your neighborhood will be policed, and how you will be treated if arrested.
No one will be surprised to learn that ever more powerful computers processing rapidly expanding volumes of data have become a ubiquitous tool of decision-makers in many areas. O'Neil drives the point home by assembling numerous examples ranging from college ranking systems to payday loan-sharking and hedge-fund trading. Perversely, many of the algorithms used in these analyses privilege the already-privileged and handicap the already-handicapped: If you've been treated for depression, you're less likely to find work and therefore more likely to relapse; if you're a "borrower from the rough section of East Oakland," you'll pay higher interest on your credit card even though you're "already struggling." Whether intentionally or unintentionally, the automation of selection processes thus increases inequality.
Although the author is herself a working data scientist, she takes great pains to avoid technical jargon and to put her argument in the simplest terms possible. To explain the intuition that underlies the application of statistics to the real world, she begins with baseball, the locus classicus for all American stats mavens. Anyone can understand why fielders will shift their positions toward right field when facing a slugger known to hit that way more often than not.
Not all statistical models are so easy to grasp, however. O'Neil distinguishes between healthy models and unhealthy ones. Healthy models are transparent. The cogs and wheels that make them function are exposed for all to see, understand, and evaluate. They are continuously updated as new data flow in. If the model makes inaccurate predictions, it can be corrected and tested against still newer data to see if it improves. If it cannot be improved--if its predictions remain erratic--then it should be scrapped, since the inability to improve the model suggests inadequate understanding of the underlying process. Still, no one should mistake the existence of a sophisticated algorithm for proof that it is valid, trustworthy, or harmless.
O'Neil's book can be read as a plea to her fellow data scientists to take a Hippocratic oath for the age of big data: Above all, a good algorithm should do no harm. She identifies any number of cases in which that oath has been violated. One that comes in for special attention is an algorithm designed to measure the "value added" by individual teachers in the classroom. A middle school English teacher named Tim Clifford was devastated to learn that he had received an "abysmal 6 out of 100" in a value-added evaluation. When he tried to determine where he had gone wrong, however, he was unable to identify any specific flaw in his teaching. The following year he therefore taught a similar class without changing his approach at all. His score shot up to 96. The experience made him "realize how ridiculous the entire value-added model is when it comes to education." Ill-conceived performance measures not only shame and penalize individuals but also result in bad public policy when administrators place undue confidence in seemingly objective measurements to the detriment of possibly more informative modes of evaluation.
Sometimes, the harm done by algorithms is inflicted not on individuals but on entire groups. For instance, one education consulting firm helps colleges "target the most promising candidates for recruitment" on the basis of ability to pay full tuition or eligibility for outside scholarships. A company in Virginia supplies software to sift through call center traffic to shorten the waiting time of those deemed to be "more profitable prospects." Distilled from such examples, the heart of O'Neil's argument is simply stated: The incessant drive to cut costs and increase profits has discriminatory consequences. Algorithms are touted as antidotes to prejudice, or subjective bias, but in many cases they simply replace subjective bias with what can only be called objective bias: Even if there is no intent to treat people unequally, inequality is built into the machinery of choice.
The evaluation of applicants for auto insurance offers an interesting example. In 2015, Consumer Reports researchers looked into the pricing of auto insurance policies, analyzing some two billion price quotes from around the country. What they discovered was that insurers rated applicants not only on the basis of their driving records but also on information gleaned from credit reports. This so-called "proxy data" weighed more heavily than a driver's actual safety history: "In New York State, for example, a dip in a driver's credit rating from 'excellent' to merely 'good' could jack up the annual cost of insurance by $255."
Since poor people tend to have worse credit ratings than those better off, the application of this algorithm was inherently discriminatory. The investigators found that because of reliance on proxy data such as credit scores and high school grades, the pricing of auto insurance was "wildly unfair." But why would insurers choose to weigh such proxy data more heavily than actual driving records in their algorithm?
O'Neil's answer reveals both the power and the weakness of her approach. "Automatic systems," she writes, "can plow through credit scores with great efficiency and at enormous scale." This is certainly true, but such systems can also plow through driving records. Why prefer credit scores? Is electronic access easier to obtain than access to driving records? This may well be the case, but the book provides no evidence of it. Instead, the author abruptly shifts gears to attack the motives of the insurance companies rather than the algorithm they use, which is her ostensible subject: "I would argue that the chief reason has to do with profits. If an insurer has a system that can pull in an extra $1,552 a year from a driver with a clean record, why change it? The victims of their WMD ... are more likely to be poor and less educated, a good number of them immigrants. They're less likely to know that they're being ripped off. And in neighborhoods with more payday loan offices than insurance brokers, it's harder to shop for lower rates. In short, while an e-score might not correlate with safe driving, it does create a lucrative pool of vulnerable drivers. Many of them are desperate to drive--their jobs depend on it. Overcharging them is good for the bottom line."
This is vivid writing. It is also highly tendentious. No justification for the $1,552 figure appears in the book, nor is the insurers' side of the story aired at all. The Consumer Reports article cited is more informative. We learn that "car insurers didn't use credit scores until the mid-1990s. That's when several of them, working with the company that created the FICO score, started testing the theory that the scores might help to predict claim losses. They kept what they were doing hush-hush. By 2006, almost every insurer was using credit scores to set prices. But two-thirds of consumers surveyed by the Government Accountability Office at about the same time said they had no idea that their credit could affect what they paid for insurance."
From this brief account we learn that insurers did indeed follow O'Neil's prescription to test their models against real out-of-sample data. Did they do so in order to hone a strategy for ripping off the poor and vulnerable, as O'Neil suggests? Or was it that risk pools based on credit scores proved more accurate in predicting the likelihood of future accidents than risk pools defined by past driving records? The latter is a logical possibility that one would expect a mathematician like O'Neil to consider before indicting the motives of the insurers.
Of course, even if credit scores are a useful proxy for accident-proneness, we might conclude that using them to construct risk pools is inherently unfair. O'Neil's remedy is to insist that the algorithms used in constructing statistical instruments be made transparent and subjected to scrutiny by stakeholders. This is a reasonable proposal, but it won't help with another problem she identifies: the gaming of algorithms. For instance, Baylor University administrators used knowledge of how U.S. News and World Report computes its college rankings to improve the standing of their institution. Because the U.S. News algorithm was at least partially transparent, administrators "paid the fee for admitted students to retake the SAT" in order to "boost their scores--and Baylor's ranking." Hence transparency is no panacea.
Weapons of Math Destruction provides a handy map to a few of the many areas of our lives over which invisible algorithms have gained some control. As the empire of big data continues to expand, Cathy O'Neil's reminder of the need for vigilance is welcome and necessary, despite the occasional breathlessness of her prose. Patience and rigor are not what one expects from a crier of alarm, a role for which O'Neil is particularly well-suited and in which she performs admirably.
Arthur Goldhammer is a writer, translator, and affiliate of the Center for European Studies at Harvard.
Goldhammer, Arthur
Algorithms in education marked 'WMDs' in data scientist's new book
Jamaal Abdul-Alim
Diverse Issues in Higher Education. 33.21 (Nov. 17, 2016): p22.
Copyright: COPYRIGHT 2016 Cox, Matthews & Associates
http://diverseeducation.com/
Listen
Full Text:
The public deserves to know more about the mathematical methods that are being used behind the scenes to make decisions that impact our everyday lives--including who gets into college and who gets targeted by for-profit colleges.
[ILLUSTRATION OMITTED]
That's the message that Cathy O'Neil, a data scientist delivered during a recent discussion of her new book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.
The objects in her crosshairs are algorithms--the computational procedures that, among other things, can be used to achieve ratings, scores or a particular set of results from sets of data or pieces of information.
In her book, O'Neil takes aim at a number of areas in higher education in which algorithms she considers to be weapons of math destruction--or WMDs--are being used. She writes of how WMDs are "poisoning the college admissions process," to how for-profit colleges deploy "predatory ads" online to entice the poor to take out and fork over large student loans to the schools, which often leave them saddled with debt and degrees of dubious value.
The only way to stop the algorithmic abuse is to shine the light on the algorithms in order to better understand their potentially harmful effects, O'Neil said during a recent discussion of her book at Politics & Prose, a popular cafe in downtown Washington D.C.
"The public needs to stop trusting algorithms, push back and demand accountability because when you have secrecy, you have no accountability," O'Neil said in her talk.
O'Neil cited several examples of WMDs from her book, from how teacher evaluations rely on faulty and questionable student test score data to fire teachers to how online "personality tests" are being used as de facto mental health exams.
Interestingly, the inspiration for her book stems from an epiphany she had during an encounter with a venture capitalist who had an aversion to online ads from the University of Phoenix, an online for-profit school.
It happened when the prospective investor met with O'Neil to discuss investing in a start-up she was working on at the time that dealt with targeted online ads.
O'Neil said she was taken aback when the prospective investor remarked during the meeting that he hoped to see the day when he could peruse the internet and be "offered trips to Aruba and jet skis and never again have to see a University of Phoenix ad because those aren't for people like me."
"I was like, 'Whatever happened to the democratizing force of the internet?'" O'Neil said. "This is the goal of people constructing the modern internet, which is to segregate people and silo them by class, so that we--the technologists--have opportunities like toys and things we like to play with, and people on the other side of the spectrum can be preyed on."
O'Neil said she had never seen a University of Phoenix ad before that encounter. Now, she says, she wants to dismantle how for-profit colleges use ads to "target and fleece the population most in need."
"They sell them the promise of an education and a tantalizing glimpse of upward mobility--while plunging them deeper into debt," O'Neil writes in her book. "They take advantage of the pressing need in poor households, along with their ignorance and their aspirations, then they exploit it."
But O'Neil also takes aim at college admission models--which she says are derived in part from the U.S. News & World Report college rankings. She says they drive well-to-do parents to spend thousands of dollars on admissions consultants and special courses to teach their children how to flourish in the college admission process.
"Each one is a mini-WMD," O'Neil writes of the college admission models in her book. "These models lead students and their parents to run in frantic circles and spend obscene amounts of money. And they're opaque." The "victims," she says are the vast majority of poor and middle-class families who "don't have thousands of dollars to spend on courses and consultants."
"They miss out on precious insider knowledge," O'Neil writes. "The result is an education system that favors the privileged. It tilts against needy students, locking out the great majority of them--and pushing them down a path toward poverty. It deepens the social divide."
By Jamaal Abdul-Alim
--Jamaal Abdul-Alim can be reached at jabdul-alim@diverseeducation.com
Schutt, Rachel. Doing data science
S.-T. Kim
CHOICE: Current Reviews for Academic Libraries. 51.12 (Aug. 2014): p2224.
Copyright: COPYRIGHT 2014 American Library Association CHOICE
http://www.ala.org/acrl/choice/about
Listen
Full Text:
Schutt, Rachel. Doing data science, by Rachel Schutt and Cathy O'Neil. O'Reilly, 2014. 375p index ISBN 9781449358655 pbk, $39.99
51-6803
QA76
MARC
Schutt (News Corp.; adjunct, Columbia Univ.) and O'Neil (Johnson Research Labs) present a timely introduction to data science in this current era of big data. Their book, based on a course Schutt gave at Columbia (with numerous guest lecturers) and O'Neil's blog entries about the course
Kim, S.-T.
Review: Weapons of Math Destruction
In an important new book, Cathy O'Neil warns us that algorithms can and do perpetuate inequality
By Evelyn Lamb on August 31, 2016
Credit: Cathy O'Neil
“The technology already exists. It’s only the will we’re lacking.” These sentences from Cathy O’Neil’s new book Weapons of Math Destruction have been haunting me since I read it. They come from the last chapter of a book in which she has illustrated again and again how, in the words of her subtitle, "big data increases inequality and threatens democracy." With Facebook's new trending topics algorithm and data-driven policing in the news, the book is certainly timely.
Weapons of math destruction, which O’Neil refers to throughout the book as WMDs, are mathematical models or algorithms that claim to quantify important traits: teacher quality, recidivism risk, creditworthiness but have harmful outcomes and often reinforce inequality, keeping the poor poor and the rich rich. They have three things in common: opacity, scale, and damage. They are often proprietary or otherwise shielded from prying eyes, so they have the effect of being a black box. They affect large numbers of people, increasing the chances that they get it wrong for some of them. And they have a negative effect on people, perhaps by encoding racism or other biases into an algorithm or enabling predatory companies to advertise selectively to vulnerable people, or even by causing a global financial crisis.
O’Neil is an ideal person to write this book. She is an academic mathematician turned Wall Street quant turned data scientist who has been involved in Occupy Wall Street and recently started an algorithmic auditing company. She is one of the strongest voices speaking out for limiting the ways we allow algorithms to influence our lives and against the notion that an algorithm, because it is implemented by an unemotional machine, cannot perpetrate bias or injustice.
Many people think of Wall Street and hedge funds when they think of big data and algorithms making decisions. As books such as The Big Short and All the Devils Are Here grimly chronicle, subprime mortgages are a perfect example of a WMD. Most of the people buying, selling, and even rating them had no idea how risky they were, and the economy is still reeling from their effects.
O’Neil talks about financial WMDs and her experiences , but the examples in her book come from many other facets of life as well: college rankings, employment application screeners, policing and sentencing algorithms, workplace wellness programs, and the many inappropriate ways credit scores reward the rich and punish the poor. As an example of the latter, she shares the galling statistic that “in Florida, adults with clean driving records and poor credit scores paid an average of $1552 more than the same drivers with excellent credit and a drunk driving conviction.” (Emphasis hers.)
She shares stories of people who have been deemed unworthy in some way by an algorithm. There’s the highly-regarded teacher who is fired due to a low score on a teacher assessment tool, the college student who couldn’t get a minimum wage job at a grocery store due to his answers on a personality test, the people whose credit card spending limits were lowered because they shopped at certain stores. To add insult to injury, the algorithms that judged them are completely opaque and unassailable. People often have no recourse when the algorithm makes a mistake.
Many WMDs create feedback loops that perpetuate injustice. Recidivism models and predictive policing algorithms—programs that send officers to patrol certain locations based on crime data—are rife with the potential for harmful feedback loops. For example, a recidivism model may ask about the person’s first encounter with law enforcement. Due to racist policing practices such as stop and frisk, black people are likely to have that first encounter earlier than white people. If the model takes this measure into account, it will probably deem a black person more likely But they are harmful even beyond their potential to be racist. O’Neil writes,
A person who scores as ‘high risk’ is likely to be unemployed and to come from a neighborhood where many of his friends and family have had run-ins with the law. Thanks in part to the resulting high score on the evaluation, he gets a longer sentence, locking him away for more years in a prison where he’s surrounded by fellow criminals—which raises the likelihood that he’ll return to prison. He is finally released into the same poor neighborhood, this time with a criminal record, which makes it that much harder to find a job. If he commits another crime, the recidivism model can claim another success. But in fact the model itself contributes to a toxic cycle and helps to sustain it.
O’Neil’s book is important in part because, as she points out, an insidious aspect of WMDs is the fact that they are invisible to those of us with more power and privilege in this society. As a white person living in a relatively affluent neighborhood, I am not targeted with ads for predatory payday lenders while I browse the web or harassed by police officers who are patrolling “sketchy” neighborhoods because an algorithm sends them there. People like me need to know that these things are happening to others and learn more about how to fight them.
While Weapons of Math Destruction is full of hard truths and grim statistics, it is also accessible and even entertaining. O’Neil’s writing is direct and easy to read—I devoured it in an afternoon. And the book is not all grim. In the last chapter, she shares some ideas of how we can disarm WMDs and use big data for good. She proposes a Hippocratic Oath for data scientists and writes about how to regulate math models. Let’s return to the sentences I opened with: “The technology already exists. It’s only the will we’re lacking.” That is bleak—we aren’t doing what we can—but should give us some hope as well. The technology exists! If we develop the will, we can use big data to advance equality and justice.
WEAPONS OF MATH DESTRUCTION
How Big Data Increases Inequality and Threatens Democracy
by Cathy O'Neil
Best of 2016
BUY NOW FROM
AMAZON
BARNES & NOBLE
LOCAL BOOKSELLER
GET WEEKLY BOOK RECOMMENDATIONS:
Email Address
Enter email
Subscribe
Email this review
KIRKUS REVIEW
How ill-conceived algorithms now micromanage America’s economy, from advertising to prisons.
“Welcome to the dark side of Big Data,” writes math guru O’Neil (Doing Data Science: Straight Talk from the Frontline, 2013, etc.), a blogger (mathbabe.org) and former quantitative analyst at the hedge fund D.E. Shaw. In this simultaneously illuminating and disturbing account, she describes the many ways in which widely used mathematic models—based on “prejudice, misunderstanding, and bias”—tend to punish the poor and reward the rich. The most harmful such models, which she calls “Weapons of Math Destruction,” often have devastating effects on people when they are going to college, borrowing money, getting sentenced to prison, or finding and holding a job. For example: credit scores are used to evaluate potential hires (assuming bad scores correlate with bad job performance, which is often not true); for-profit colleges use data to target and prey on vulnerable strivers, often plunging them into debt; auto insurance companies judge applicants by their consumer patterns rather than their driving records; crime predictive software often leads police to focus on nuisance crimes in impoverished neighborhoods. As the author notes, the harmful effects are apparent “when a poor minority teenager gets stopped, roughed up, and put on warning by the local police, or when a gas station attendant who lives in a poor zip code gets hit with a higher insurance bill.” She notes the same mathematical models “place the comfortable classes of society in their own marketing silos,” jetting them off to vacations in Aruba, wait-listing them at Wharton, and generally making their lives “smarter and easier.” The author writes with passion—a few years ago she became disillusioned over her hedge fund modeling and joined the Occupy movement—but with the authority of a former Barnard professor who is outraged at the increasingly wrongheaded use of mathematics. She convincingly argues for both more responsible modeling and federal regulation.
An unusually lucid and readable look at the daunting algorithms that govern so many aspects of our lives.
Pub Date: Sept. 6th, 2016
ISBN: 978-0-553-41881-1
Page count: 272pp
Publisher: Crown
Review Posted Online: July 19th, 2016
Kirkus Reviews Issue: Aug. 1st, 2016
People and Technology
By CLAY SHIRKY OCT. 3, 2016
WEAPONS OF MATH DESTRUCTION
How Big Data Increases Inequality and Threatens Democracy
By Cathy O’Neil
259 pp. Crown, $26.
O’Neil’s book offers a frightening look at how algorithms are increasingly
regulating people. A Ph.D. in math from Harvard, she’s also worked as a Wall Street
quant. Her knowledge of the power and risks of mathematical models, coupled with
a gift for analogy, makes her one of the most valuable observers of the continuing
weaponization of big data.
Her central complaint is that of all the ways organizations use data, there is a
pernicious subset — the W.M.D.s of the title — that are opaque, operate at enormous
scale and can damage people’s lives. Her first case study is the financial crisis of
2008, kicked off by mathematical models of mortgage payments that were precise,
complex and disastrously wrong.
Algorithms have these same effects in teacher evaluations, hiring practices,
college admissions. Often, what an organization cares about — job performance,
recidivism — can’t be measured directly, so imperfect proxies are used. Credit
scores, for example, stand in for employee worth.
O’Neil does offer some design principles for better algorithms, like adding feedback
loops that adjust to past mistakes. But like many books that diagnose social risks, the
section on solutions is weaker than the illustration of the problem. Understanding
3/4/2017 People and Technology The New York Times
https://www.nytimes.com/2016/10/09/books/review/weaponsofmathdestructioncathyoneilandmore.html 2/4
just how big the problem is, though, is critical, and O’Neil does a masterly job
explaining the pervasiveness and risks of the algorithms that regulate our lives.
Weapons of Math Destruction
Posted on September 6, 2016 by woit
Cathy O’Neil’s important new book Weapons of Math Destruction, is out today, and if you’re at all interested in the social significance of how mathematics is now being used, you should go out and get a copy. She has been blogging for quite a while at Mathbabe, which you should be following, and is a good place to start if your attention span is too short for the book.
Cathy has had an interesting career path, including a period as my colleague here in the math department at Columbia. She left here to pursue excitement and fortune at a top hedge fund, D.E. Shaw, where she had a front-row seat at the 2008 financial industry collapse. A large factor in that collapse was the role played by mathematical models, and her book explains some of that story (for another take on this, there’s Models.Behaving.Badly from another Columbia colleague, Emanuel Derman). As far as I’ve ever been able to figure out, the role of mathematical modeling in the mortgage backed securities debacle was as a straightforward accessory to fraud. Dubious and fraudulent lending was packaged using mathematics into something that could be marketed as a relatively safe investment, with one main role of the model that of making it hard for others to figure out what was going on. This worked quite well for those selling these things, with the models successfully doing their job of obscuring the fraud and keeping most everyone out of jail.
While this part of the story is now an old and well-worn one, what’s new and important about Weapons of Math Destruction is its examination of the much wider role that mathematical modeling now plays in our society. Cathy went on from the job at D.E. Shaw to work first in risk management and later as a data scientist at an internet media start-up. There she saw some of the same processes at work:
In fact, I saw all kinds of parallels between finance and Big Data. Both industries gobble up the same pool of talent, much of it from elite universities like MIT, Princeton and Stanford. These new hires are ravenous for success and have been focused on external metrics – like SAT scores and college admissions – their entire lives. Whether in finance or tech, the message they’ve received is that they will be rich, they they will run the world…
In both of these industries, the real world, with all its messiness, sits apart. The inclination is to replace people with data trails turning them into more effective shoppers, voters, or workers to optimize some objective… More and more I worried about the separation between technical models and real people, and about the moral repercussions of that separation. If fact, I saw the same pattern emerging that I’d witnessed in finance: a false sense of security was leading to widespread use of imperfect models, self-serving definitions of success, and growing feedback loops. Those who objected were regarded as nostalgic Luddites.
I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, taking in outrageous fortunes and convincing themselves that they deserved it.
The book then goes on to examine various examples of how Big Data and complex algorithms are working out in practice. Some of these include:
The effect of the US News and World Report algorithm for college ranking, as colleges try and game the algorithm, while at the same time well-off families are at work gaming the complexities of elite college admissions systems.
The effects of targeted advertising, especially the way it allows predatory advertisers (some for profit educational institutions, payday lenders, etc.) to very efficiently go after those most vulnerable to the scam.
The effects of predictive policing, with equality before the law replaced by an algorithm that sends different degrees of law enforcement into different communities.
The effects of automated algorithms sorting and rejecting job applications, with indirect consequences of discrimination against classes of people.
The effects of poorly thought-out algorithms for evaluating teachers, sometimes driving excellent teachers from their jobs .
The effects of algorithms that score credit, determine access to mortgages and to insurance, often with the effect of making sure that those deemed losers stay that way.
Finally, there’s a chapter on Facebook and the way political interests are taking advantage of the detailed information it provides to target their messages, to the detriment of democracy.
To me, Facebook is perhaps the most worrisome of all the Big Data concerns of the book. It now exercises an incredible amount of influence over what information people see, with this influence sometimes being sold to the highest bidder. Together with Amazon, Google and Apple, our economy and society have become controlled by monopolies to an unparalleled degree, monopolies that monitor our every move. In the context of government surveillance, Edward Snowden remarked that we are now “tagged animals, the primary difference being that we paid for the tags and they’re in our pockets.” A very small number of huge extremely wealthy corporations have even greater access to those tags than the government does, recording every movement, communication with others, and even every train of thought as we interact with the web.
These organizations are just starting to explore how to optimize their use of our tags, and thus of us. Of the students starting classes here today in the math department, our training will allow many of them to go on to careers working for these companies. As they go off to work on the algorithms that will govern the lives of all of us, I hope they’ll start by reading this book and thinking about the issues it raises.
BOOK REVIEWS
TRENDINGOscars 2017 | Moonlight | Grammys 2017 | Important Questions
Algorithm is gonna get you: Cathy O’Neil’s Weapons of Math Destruction explores how we’re trapped by Big Data
Republish
Reprint
Molly Sauter, Special to National Post | October 19, 2016 | Last Updated: Oct 31 11:15 AM ET
More from Special to National Post
Twitter Google+ Reddit Email Typo? More
screen-shot-2016-10-19-at-11-26-46-amWeapons of Math Destruction
By Cathy O’Neil
Crown
272 pp; $35
Reading Cathy O’Neil’s indispensable Weapons of Math Destruction, her thorough and often upsetting account of the complex predictive and sorting algorithms that have become powerful forces in the everyday lives of most people, a line from the late great Ursula Franklin’s The Real World of Technology kept echoing in my mind. In 1990, Franklin wrote, “We have lost the institution of government in terms of responsibility and accountability to the people. We now have nothing but a bunch of managers, who run the country to make it safe for technology.” O’Neil’s book in places reads like a confirmation of Franklin’s, as she describes the minute ways in which certain algorithms, deployed in realms of human endeavor from finance to job hunting to criminal justice, are not only defining and interpreting the worlds into which they’ve been unleashed, but are also actively shaping them.
O’Neil’s career as a Wall Street quant and data scientist gives her an insider’s view into the rise of Big Data, the catch-all term for crunching huge amounts of data pertaining to hundreds or thousands or millions of people at once, looking for emergent trends that help predict behaviour. Despite the technical complexity of its subject, Weapons of Math Destruction lucidly guides readers through these complex modelling systems that O’Neil describes as “opinions embedded in mathematics.” Not all such models attract her ire: the titular “weapons of math destruction” are a specific species of algorithmic bad actors in which “poisonous assumptions are camouflaged by math and go largely untested and unquestioned.” These “WMDs” tend to target the poor or marginalized, contain self-validating feedback loops, and can have sharply negative impacts on those caught in their nets.
O’Neil highlights the centrality of the profit motive in the development and deployment of these algorithms, which are not, for the most part, deployed to make the lives of those whom they sort easier or better. These algorithms execute a series of invisible transformations: person into consumer, consumer into mark, mark into dollar sign. Profitability becomes not only the stamp of a successful algorithm, O’Neil notes, but also “a stand-in or proxy for the truth.” Individuals are elided into data points, and it becomes easy to confuse the algorithm’s descriptive correlations for those things they are attempting to describe.
One prominent example of this is the use of students’ standardized test scores as a proxy for school quality, as occurred in the United States under a set of regulations called No Child Left Behind. NCLB valorized standardized testing as the key metric for evaluating teachers and schools, leading public schools in the States to devote hours to “teaching to the test,” trying to push up test scores to avoid firings, funding cuts, or even school closures. Scrambles to avoid falling behind in test score rankings have led to at least one major cheating scandal, where teachers in an at-risk school in Atlanta, Georgia, changed students’ answers on a state-wide standardized exam to keep their school from being shuttered.
These algorithms hit the most precarious in our societies the hardest by explicitly targeting them for malicious services (such as payday loans or for-profit college degrees) or by silently, invisibly removing them from contention for jobs, apartments, a place at university, or even by lining them up for harsher penalties from the criminal justice system. O’Neil describes the impact of a mathematical model meant to gauge criminal recidivism risk this way:
“A person who scores ‘high risk’ is likely to be unemployed and to come from a neighbourhood where many of his friends and family have had run-ins with the law. Thanks in part to the resulting high score on the evaluation, he gets a longer prison sentence, locking him away for more years in a prison where he’s surrounded by fellow criminals – which raises the likelihood that he’ll return to prison. He is finally released into the same poor neighbourhood, this time with a criminal record, which makes it much harder to find a job. If he commits another crime, the recidivism model can claim another success. But in fact the model contributes to a toxic cycle and helps to sustain it.”
Though not all the malignant algorithms O’Neil examines are explicitly designed to separate a sucker from his money, it’s astonishing how often algorithms use money – whether one has it or doesn’t, or how much one is willing to spend to get out of an algorithm’s grasp – as the defining characteristic of a person. Often this takes the form of a credit score. In Canada and in the United States, it’s common for landlords to perform credit checks on prospective tenants, and some employers treat the automatic, algorithmic checking of an applicant’s credit as on par with checking their references, particularly with low-skill, entry-level, high-churn jobs that are often lifelines for economically precarious populations. It’s distressing that access to basic human needs like housing or the means to earn a living is potentially dependent upon the shibboleth of credit-worthiness.
As Ursula Franklin noted, these algorithms have a shaping impact on the world around us and our behaviour in it. As algorithms are increasingly used as arbiters of quality, corporations and individuals are increasingly playing to the algorithms themselves, creating an increasingly tunnel-visioned game of maximizing algorithmic gain while disregarding costs and risks.
O’Neil describes the impact of US News & World Report’s ambitious college ranking project, begun in 1983 in an attempt to remedy flagging newsstand sales, on schools themselves. As the US News rankings became more popular among parents attempting to wade through the thousand of higher education options, colleges and universities began attempting to game specific factors in the US News ranking algorithm (which was developed by journalists, not higher education experts), hoping to up their position. Suddenly colleges were paying admitted students to retake the SATs in order to increase the average score for the incoming class, or shifting their admissions and financial aid policies to appear more selective and increase their four-year graduation rate.
Did these moves make these colleges “better”? Maybe. But unless “better” can be measured algorithmically, that question is unlikely to be seriously asked or seriously answered. An over-reliance on these mathematical models prevents us from asking whether the quantification of knowledge, the endless translation of human concepts of quality and of worth – of human beings themselves – into strings of numbers and collapsed data points is a good thing. Is quantitative knowledge always the best type knowledge for a given situation? While Weapons of Math Destruction does not dive into epistemology, it will hopefully open a space where such conversations can occur.
O’Neil’s book is an excellent primer on the ethical and moral risks of Big Data and an algorithmically dependent world. It compellingly describes algorithms (and those who use them) behaving badly, and advocates for society to do better. O’Neil is no quantitative apostate. She believes in the beneficial potential of Big Data, and that it can do better. For those curious about how Big Data can help them and their businesses, or how it has been reshaping the world around them, Weapons of Math Destruction in an essential starting place.
Doing Data Science
I just finished reading Doing Data Science: Straight Talk from the Frontline, an O’Reilly book by Cathy O’Neil (@mathbabedotorg) and Rachel Schutt (Columbia Data Science blog).
First let me say, I really enjoyed this book! I thought it gave a great overview of Data Science, which is very valuable at this early stage in my data science journey. The authors attempt to define Data Science, but also explain that the definition is evolving, and show throughout the book all of the different types of things that can be categorized as data science activities. I also enjoyed that they emphasize data science teams, and each presenter in the book (each chapter is based on a lecture in the course which had a guest speaker from the field) was introduced with their level of expertise in the various aspects of data science (see image below). For instance, some were more focused on machine learning, while others focused more on visualization, and they were from a variety of different industries. This was nice because it meant the authors didn’t use the same example problems repeatedly when discussing different techniques.
DS_profile
Data Scientist Profile (via semanticcommunity, more here)
Speaking of visualization, I would say that the one negative of the book is that the images were not designed to be printed in black and white, and many are hard to read. There is an image with the caption “red means cancer, green means not”, but the dots all appear to be similar colors of grey. There is an image the students in the class designed to show the various aspects of data science which is basically unreadable because it is tiny and has some text that comes out as grey-on-grey (I happened to find a color version of that image here).
Now, don’t expect to read the book and immediately be able to go out and do all of the activities in the book. First of all, there is a list of prerequisites the authors assume you have. You don’t have to have a deep understanding of all of these fields in order to gain something from the book, but they use terminology at times from linear algebra, statistics, machine learning, and other technical areas, and you would definitely need some of these skills in order to do some of the suggested activities. However, throughout the book are constant definitions and clarifications, references to other texts, websites, and people. I found this to be incredibly useful – any time you want to learn more about a topic, the authors point out how to find more information, and recommend books on the subject.
To make a metaphor, Rachel Schutt and Cathy O’Neil tell you about a great dish someone cooked, and give some general info about the process of making the dish, and what to watch out for when you attempt it yourself. They even include some quotes from the chef about the art of making this particular dish, and tips on preparing and presenting it. But you still have to go out and get the ingredients and tools and learn some cooking techniques and look in some other cookbooks in order to figure out the detailed steps. Then, you have to do a lot of chopping and sautéing and probably burn a few things before you successfully create a similar dish you can serve to your customers. They don’t just hand you a simple recipe, and you are probably a casual at-home cook, not a professional chef yet.
You could describe the book as kind of a “roadmap” to data science. There is some math and some code, but it is much more breadth than depth. The book is not pretentious, and actually warns data scientists against hubris, since overconfidence in a certain tool or method can have negative impact on your work. There are a lot of “tips”, “things to think about”, and “lessons learned” that I feel give the reader a great sense of what pitfalls you might come across when doing real-world analysis, and how to avoid the common ones, but only a few step-by-step how-to’s and code examples (in R or Python).
Some topics I bookmarked to learn more about that I hadn’t read about before “Doing Data Science” introduced them to me: F-score (a combination of precision and recall – terms defined in the book), Log Returns, Simpson’s Paradox, Exponential Random Graph Models.
Some topics I already knew a little about, but “Doing Data Science” helped me better understand: various similarity/distance metrics, exploratory data analysis, data leakage, recommendation engines, confounding variables.
I can imagine that some readers wouldn’t like that the book is “all over the place” and that it gives a combination of not much detail on some topics, and a lot of detail all at once on others; too technical and math-y on some topics, and very “laymans terms” on others. However, I liked that about the writing. It really touches on everything, and gives you enough direction to know where to go next to learn more. It feels like you’re meeting a bunch of people that have had a variety of experiences in the industry, and you’re all trying to give each other a feel for what you do: being technical enough to be impressive, but clear enough to be accessible, and explaining how you learned your particular subset of skills and where someone can get more info.
I give it 5 out of 5, despite the fact that the images were sometimes unreadable. I also suggest you check out the blog that goes with the course that the book follows: http://columbiadatascience.com/blog/.
Book Review: Doing Data Science
November 16, 2013 by Daniel Gutierrez Leave a Comment Print Friendly
Tweet
Share
Share
+1
Pin
Stumble
SHARES 0
Doing_data_scienceO’Reilly Media does it right. Their PR department gives valuable support to the grassroots efforts in the data science community by helping out local Meetup groups. A case in point is how they provided a number of current titles to be given away as raffle prizes for the Los Angeles R User Group of which I am a member. So I thought I’d review a new O’Reilly book that just came out in October 2013, Doing Data Science: Straight Talk from the Frontline by Rachel Schutt & Cathy O’Neil.
I found this book to be a very odd bird indeed. It is one book you can read from back cover to front cover and not be at a disadvantage. This is because the book is really just a collection of presentations made by various people to a class taught by the primary author Rachel Schutt at Columbia University in the Fall of 2012 – Introduction to Data Science. It wasn’t entirely clear what content Schutt was directly responsible for since only some of the chapters indicate who the contributors were (one of the chapters was contributed by a group of her students!). The co-author, Cathy O’Neil, I’ve encountered before as an outspoken blogger going by the name “mathbabe” but it wasn’t specifically stated how she became part of the book project, other than to say she was one of the students in Schutt’s class. Chapter 6 was partly written by O’Neil.
Both Schutt and O’Neil are Ph.D.s data science appropriate fields, but the book was not “written” by the two, rather they seemed to have performed some kind of editing function with the materials submitted by each contributor and added commentaries of their own. As a result, the book is a hodgepodge of anecdotes, factoids, R code snippets, plots, and mathematics, all from the in-class presentations. I enjoy seeing math in data science books, but the equations in this book were sort of just floating there requiring the reader to explore further at another time.
Although I have issues with the book as it is not any sort of text for the field, I did enjoy reading it with a number of “Ah, I didn’t know that!” moments. Schutt’s credentials in data science are considerable, having worked at Google for a few years around the same time that “data science” was growing up in Silicon Valley. As a result the book has many memorable anecdotes about the early days of the data science industry, and observations about what makes big data tick. I enjoyed the story about the Google software engineer who accidentally deleted 10 petabytes of data, and I think my favorite quote from the book is from the student’s chapter 15:
Kaggle competitions could be described as the dick-measuring contests of data science.
With contributor’s chapters on statistical inference, machine learning algorithms, logistic regression, financial modeling, recommendation engines, data visualization, Hadoop, MapReduce, and more, I’d say the book is worth a read, but not necessarily as a source of learning data science but more as a high-level guide and short historical account of this young industry. You get to learn about the people, companies, technologies that have collectively built the data science arena and you’ll be better for it especially if you are working to become a data scientist yourself.
January 23, 2014
Book review: "Doing Data Science" by Rachel Schutt and Cathy O'Neil
by Joseph Rickert
Every once in a while a single book comes to crystallize a new discipline. If books still have this power in the era of electronic media, "Doing Data Science, Straight Talk from the Frontline" by Rachel Schutt and Cathy O’Neil: O'Reilly, 2013 might just be the book that defines data science. "Doing Data Science", which is based on a course that Rachel taught at Columbia University and to which Cathy contributed, is ambitious and multidimensional. It presents data science in all of its messiness as an open-ended practice that is coalescing around an expanding class of problems; problems which are yielding to an interdisciplinary approach that includes ideas and techniques from statistics, computer science, machine learning, social science and other disciplines.
The book is neither a statistics nor a machine learning text, but there are plenty of examples of statistical models and machine learning algorithms. There is enough R code in the text to get a beginner started on real problems with tools that are immediately useful. There is Python code, a bash shell script, mention of JSON and a down to earth discussion of Hadoop and MapReduce that many should find valuable. My favorite code example is the bash script (p 105) that fetches an Enron spam file and performs some basic word count calculations. Its almost casual insertion into the text, without fanfare and little explanation, provides a low key example of the kinds of baseline IT/ programmer skills that a newly minted statistician must acquire in order to work effectively as a data scientist.
"Doing Data Science" is fairly well balanced in its fusion of the statistics and machine learning world views, but Rachel’s underlying bias as a PhD statistician comes through when it counts. The grounding in linear models and the inclusion of time series models establish the required inferential skills. The discussion of causality shows how statistical inference is essential to obtaining a deep understanding of how things really work, and the chapter on epidemiology provides a glimpse into just how deep and difficult are the problems that statisticians have been wrestling with for generations. (I found the inclusion of this chapter in a data science book to be a delightful surprise.)
It is not only the selection of material, however, that betrays the book's statistical bias. When the authors take on the big questions their language indicates a statistical mindset. For example, in the discussion following "In what sense does data science deserve the word “science” in its name?" (p114) the authors write: “Every design choice you make can be formulated as an hypothesis, against which you will use rigorous testing and experimentation to validate or refute”. This is the language of a Neyman/Pearson trained statistician trying to pin down the truth. It stands in stark contrast with the machine learning viewpoint espoused in a quote by Kaggle’s Jeremy Howard who, when asked “Can you see any downside to the data-driven, black-box approach that dominates on Kaggle?”, replies:
Some people take the view that you don’t end up with a richer understanding of the problem. But that’s just not true: The algorithms tell you what’s important and what’s not. You might ask why those things are important, but I think that’s less interesting. You end up with a predictive model that works. There is not too much to argue about there.
So, whether you are doing science or not might just be in your intentions and point of view. Schutt and O’Neil do a marvelous job of exploring the tension between the quest for understanding and and the blunt success of just getting something that works.
An unusual aspect of the book is its attempt to understand data science as a cultural phenomenon and to place the technology in a historical and social context. Most textbooks in mathematics, statistics and science make no mention of how things came to be. Their authors are just under too much pressure to get on with presenting the material to stop and and discuss “just what were those guys thinking?”. But Schutt and O’Neill take the time, and the book is richer for it. Mike Driscoll and Drew Conway, two practitioners who early on recognized that data science is something new, are quoted along with other contemporary data scientists who are shaping the discipline both through their work and how they talk about it.
A great strength of the book is its collection of the real-world, big-league examples contributed by the guest lecturers to Rachel’s course. Doug Perlson of Real Direct, Jake Hofman of Microsoft Research, Brian Dalessandro and Claudia Perlich both of Media6Degrees, Kyle Teague of GetGlue, William Cukierski of Kaggle, David Huffaker of Google, Matt Gattis of Hutch.com, Mark Hansen of Columbia University, Ian Wong of Square, John Kelley of Morningside Analytics and David Madigan, Chair of the Columbia’s Statistics Department, all bring thoughtful presentations of difficult problems with which they have struggled. The perspective and insight of these practicing data scientists and statisticians is invaluable. Claudia Perlich’s discussion of data leakage alone is probably worth the price of the book.
A minor fault of the book is the occasional lapse into the hip vulgar. Someone being “pissed off” and talking about a model “that would totally suck” are probably innocuous enough phrases, but describing a vector as “huge ass” doesn’t really contribute to clarity. In a book that stresses communication, language counts. Nevertheless, "Doing Data Science" is a really “good read”. The authors have done a remarkable job of integrating class notes, their respective blogs, and the presentations of the guest speakers into a single, engaging voice that mostly speaks clearly to the reader. I think this book will appeal to a wide audience. Beginners asking the question “How do I get into data science?” will find the book to be a guide that will take them a long way. Accomplished data scientists will find a perspective on their profession that they should appreciate as being both provocative and valuable. "Doing Data Science" argues eloquently for a technology that respects humanist ideals and ethical considerations. We should all be asking "What problems should I be working on?", "Am I doing science or not?", and "What are the social and ethical implications of my work?". Finally, technical managers charged with assembling a data science team, and other interested outsiders, should find the book helpful in getting beyond the hype and and having a look at what it really takes to squeeze insight from data.