Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there anyone archiving all of reddit? Or twitter? I mean even if their terms have changed to not allow it.


> reddit

There used to be one such project (Pushshift), before the Reddit API change. You can download all the data and see all the info on the-eye, another datahoarder/preservationist group:

https://the-eye.eu/redarcs/

> twitter

Not that I know of, and you haven't even been able to archive tweets on the Wayback machine for YEARS.


Academictorrents has monthly dumps of all reddit submissions and comments even after the API restrictions.



Interesting. You don’t have to be an academic to access these I guess?


They have magnet links and torrent files right there on the pages, so no.


ArchiveTeam was doing that, but their stuff no longer works due to changes at Reddit. The wiki page about it links to some other groups doing Reddit archiving.

https://wiki.archiveteam.org/index.php/Reddit


ArcticShift is a project with that goal. It picks up where PushShift left off when the API changes killed that project.

https://github.com/ArthurHeitmann/arctic_shift



Thanks. I wonder if anyone does this for hacker news.


I believe there is a dataset in BigQuery but I haven't tried looking at it in order to know how uptodate it is <https://news.ycombinator.com/item?id=10440502>

Given that Firebase (which powers the API link at the bottom of this page) is a Google property, I cannot possibly imagine why they'd differ


Ask OpenAI maybe?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: