Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This topic has been discussed at length on HN and has been the topic of many flame wars.

If you're genuinely curious for an answer, in the United States it's absolutely the case that you are legally

> obligated to throw out the conclusion

when it comes to many of the things you mentioned (e.g., you can't consider race, or often gender and sexual orientation depending on the state, when you evaluate a loan/employment application).

The reason for those laws is that, historically, overtly and openly racist people used the loan application process to discriminate in housing.

Asking to be able to use any accurate statistical model for any application is an inherently political request, because bad actors in the past have attempted to exclude people based upon skin color. We can't make that discrimination legal whenever someone can dream up a plausible mathematical model justifying an ultimately racial intent.

The real point of my post is that a discussion on this topic devolves into a flame war unless everyone agrees that we can't turn p-hacking into a legally justifiable way of allowing racial discrimination in important social processes like housing loan or employment applications.



> We can't make that discrimination legal

> an ultimately racial intent.

I reject your assertion that by studying data, data scientists or analysts are somehow complicit in discrimination or any -ism. Their job is to make business decisions based on data, WITHOUT relying on "gut feel" or other human biases. I also reject the expectation that data scientists have an obligation (or the ability) to "fix" whatever biases may be revealed by the data.

> p-hacking

Again, you are making assumptions that data scientists are somehow evil, engaging in shady or illegal tactics to promote discrimination rather than simply doing their job in a straightforward manner. P-hacking is often done by individual researchers looking to get a study published in a scholarly journal (as opposed to months/years or research failing to reach a statistically-valid conclusion), rarely by companies who are looking to make profitable decisions based on what the data is indicating.


It appears that you make the false assumption that the data itself are unbiased and are always factually correct. This is untrue. Data does not appear magically in a dataset. It is the interpretation of the world by humans and may therefore carry the original bias, intended or not, of humans. This is why analysts have to think about what the data means whenever they do their analyses. I would say that is their ethical responsibility.


> think about what the data means

of course they do that, that is literally their job.

But to reverse-engineer the biases that may or may not exist in an original data set -- please explain how this should be accomplished, because I don't see how someone could accurately quantify the amount or degree of race/sex/age/religion/nationality-ism without introducing additional "bias" based on that person's own opinion.

> Data does not appear magically in a dataset.

Right, so why isn't the boss, or exec, or department head, or 3rd-party, who sourced the data responsible for de-biasing the data before even handing it off to the the data scientist, so s/he can just do the job of data science-ing, and not political science-ing? You're putting a whole lot of "ethical responsibility" on just one person -- ironically, the one least likely to be good at interpreting human emotional tendencies -- within a much larger ecosystem.


That's the point, it is very difficult to correctly interpret analyses. That's why you don't take conclusions for granted, and work from the basis of what you think is biologically relevant. There is usually a whole phase preceding the actual analysis. You can visualise potential relationships in directed acyclic graphs, to try and see where bias might be introduced. However, that phase is very often skipped by medical researchers.

I never said the analyst was the sole person bearing the responsibility. You are right that just as much as responsibility must be expected from those that designed the information model, those that collected the data but also the person who ultimately analyses it and prepares it for whatever kind of dissemination. Everybody involved has to take their individual responsibility so that we achieve collective responsibility.


I never make any of those assumptions (notice how you have to quote sentence fragmets and even single words to set up your stra man...)

What I assume is that not all humans are perfectly rational individuals whose only goal is profit maximization. And thatwe cannot see peoples souls, so we need laws that err in the side of caution.

I also assume that homo economicus is explicitly disincentivized from reasoning about feedback effects and historical context, which are two things many actual homo sapiens care about (for obvious reasons).

E.g., disregarding feedback loops and the long arc of history, ie in a vaccuum, supporting racialized slavery is perfectly rational for non-enslaved people interested in profit maximization. Cinsider also a completely not raciat loan lender with vested interest in high property values who knows dark skinned people lower property values. Even if the data scientist is perfectly unbiased, bias and hatred in the underlying population can result in data driven, profit motivated decisions that harm marginal groups. The fact that this really actually happened en masse is WHY we have these laws...

Regarding the latter point, you are effectively dismissing all non-consequtialist ethics and associated legal traditions as "gut feelings". I submit that these gut feelings play an important role in human societies made up of irrational people. In fact, they are even important in societies with perfectly rational people who are not super reasoners with perfect forsight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: