I don't think anyone has bothered to try explaining what is wrong with this data...

I don't think anyone has bothered to try explaining what is wrong with this data (in this thread).

The link discusses there are some response-bias models. For instance, maybe people always lie +5k. You can figure that out. Maybe you assume it's really a function of f(x)*base_salary and do something structured based on their salary as the bias.

It's, of course, perfectly fine to interpret this kind of survey as the response of those in the population that decided to take it. In this case, it's readers of hacker news who filled it in. I certainly wouldn't do that, I only registered to post this comment... anyway.

You could also try to validate this data against any other survey data, or by cross-linking LinkedIn data with mortgage data for a true dataset.

"And that's data science, bro."