> The data is collected under a non-reuse agreement Oh you sweet summer child. W...

KirinDave · on June 28, 2019

> Despite that, even assuming if it's true and we'll have a lovely accurate AI captcha system. The big down-side is that captcha is breaking programmable web.

I actually find this argument to be a bit compelling, if I'm being selfishly honest. It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.

> but services like cloudflare, distils, captcha break them and while there's always solutions to these systems they are very hard to distribute to users (you can't really pack in pupeteer, selenium or some other webengine automation stack with your app).

I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.

> Public data should be public.

Sure, but what you don't get to mandate is how their public. If someone wants to make public information available in a specific way and you don't like that way, the burden is on you to republish it. Outside of a very narrow accessibility scope, I'm neither legally nor morally obligated to cater to your specific needs. And indeed, as a service or data provider I have my own problems.

It's by no means an imaginary threat CAPTCHAs are solving. This is not a classical phantom security issue that statism uses to justify authoritarianism. It's equivalent to locking my doors when I leave a shop or making sure that my wares are properly labeled and not spoiled.

> These sort of idiotic measures are not compatible with web protocol.

The web protocol as you envision it hasn't been compatible with reality for a long time now. Hell, your crawlers are themselves a violation of the spirit of the original web. You are part of the very problem you're railing against!

> The web only know one thing - 1 IP address == 1 person and it should be encouraged not dismissed.

I suspect this statement is why you got downvoted, for what it's worth.

kabacha · on July 1, 2019

No I'm getting down-voted because hackners is notoriously pro corporate medium - of course people don't care about public data and data freedom here.

> It's super annoying that crawlers are so awkward to write these days, and I miss the days when they worked better.

It has never been easier to write crawlers witht he exception of purposfully built in barriers. Just check youtube-dl

> I don't disagree, but I also think we may be asking to keep model T's or gasoline driven 1-person bikes around. These technologies made more sense once, but make much less sense now.

What are you on about? For example to get around some crawler protections you need to execute js with some specific stack of libs. Distributing crawler.py vs distributing a whole stack is much more difficult.

Your logic makes absolutely no sense. In the web there is no distinction between who is behind the ip address. It's a net of ip addresses and headers, right? If I'm asking for your resource that you choose to serve publicly I only need to give you my IP and some http cruft, right? So now it turns out you don't want to serve _some_ ip addresses.

Now you have to introduce an extra layer that is not part of the web - a layer that is incompatible with your goal. You need to use javascript to fingerprint your client - except you know what? client is the one executing your fingerprint code so they can send whatever they want to you. I've never seen more idiotic medium. On one hand I get job security on the other the web is absolutely broken by complete bafoons who have zero logical capabilities.