Hacker Newsnew | past | comments | ask | show | jobs | submit | yooogurt's commentslogin

> if you want to back this file up regularly with something like restic, then you will quickly end up in a world of pain: since new mails are not even appended to the end of the file, each cycle of takeout-then-backup essentially produces a new giant file.

As I'm sure the author is aware, Restic will do hash-based chunking so that similar files can be efficiently be backed up.

How similar are two successive Takeout mboxes?

If the order of messages within an mbox is stable, and new emails are inserted somewhere, the delta update might be tiny.

Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.

It would be great to see empirical figures here: how large is the incremental backup after after a month's emails. How does that compare for each backup strategy?

The pro of sticking with restic is simplicity, and also avoiding the risk of your tool managing to screw up the data.

This risk isn't so bad if it's a mature tool that canonicalises mboxes (e.g. order them by time), but seems risky for something handrolled.


> As I'm sure the author is aware, Restic will do hash-based chunking so that similar files can be efficiently be backed up.

> Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.

I forget the exact number, but the rolling hashes for Restic and Borg are tuned to produce chunks sizes on the order of an entire megabyte.

Which means attachment file sizes need to be many megabytes in order for Restic to be much use, since the full chunk has to fall within the attachment. — You'd lose 0.5MB at both ends of each attachment on average, so a 5MB file would only be 80% deduped.

Nothing against Restic, but it's tuned for file-level backup, and I'm sure it wouldn't be as performant if it used chunks that were small enough to pick apart individual e-mails.

I suggested the author check out ZPAQ, which has a user-tunable average fragment size, and is arguably even simpler than Restic.

The ZPAQ file can then itself be efficiently backed up by Restic.


If you haven't heard of Rich Hickey, then you're fortunate to have the opportunity to watch "Simple Made Easy" for the first time: https://m.youtube.com/watch?v=LKtk3HCgTa8


I have seen similar critiques applied against digital tech in general.

Don't get me wrong, I continue to use plain Emacs to do dev, but this critique feels a bit rich...

Technological change changes lots of things.

The verdict is still out on LLMs, much as it was out for so much of today's technology during its infancy.


AI has an image problem around how it takes advantage of other people's work, without credit or compensation. This trend of saccharine "thank you" notes to famous, influential developers (earlier Rob Pike, now Rich Hickey) signed by the models seems like a really glib attempt at fixing that problem. "Look, look! We're giving credit, and we're so cute about how we're doing it!"

It's entirely natural for people to react strongly to that nonsense.


Every time I try to have this conversation with anyone I become very aware that most developers have never spent a single microsecond on thinking about licenses or rights when it comes to software.

To me when it's very obviously infuriating that a creator can release something awesome for free, with just a small requirement of copying the license attribution to the output, and then the consumers of it cannot even follow that small request. It should be simple: if you can't follow that then don't use it and don't ingest it and output derivatives of it.

Yet having this discussion with nearly anyone I'm usually met with "what? license? it's OSS. What do you mean I need to do things in order to use it, are you sure?". Tons of people using MIT and distributing binaries but have never copied the license to the output as required. They are simply and blissfully unaware that there is this largely-unenforced requirement that authors care deeply about and LLMs violate en masse. Without understanding this, they think the authors are deranged.


Small? GPLv3 is ~5644 words, and not particularly long for a license.

Isn't this more easily explained by supply-demand? Supply can't quickly scale, and so with increased demand there will be increased prices.


Imagine someone goes to the supermarket and buys all the tomatoes. Then supermarket owner says I don’t know, he bought all at once so it is a better sale. And he sells the remaining 10% of tomatoes at a huge markup


I think it is better compared to Dutch folks buying all the tulip bulbs. And the price skyrocketed.


Tulips were by my understanding more so NFTs. Rich people gambling when bored. With promises for tulips in future... Future contracts for tulips. And prices were high because they were insanely rich merchants.

The RAM looks like cornering market. Probably something OpenAI should be prosecuted for if they end up profiting from it.


Alternatively, we'll see a drop in deployment diversity, with more and more functionality shifted to centralised providers that have economies of scale and the resources to optimise.

E.g. IDEs could continue to demand lots of CPU/RAM, and cloud providers are able to deliver that cheaper than a mostly idle desktop.

If that happens, more and more of its functionality will come to rely on having low datacenter latencies, making use on desktops less viable.

Who will realistically be optimising build times for usecases that don't have sub-ms access to build caches, and when those build caches are available, what will stop the median program from having even larger dependency graphs.


I’d feel better about the RAM price spikes if they were caused by a natural disaster and not by Sam Altman buying up 40% of the raw wafer supply, other Big Tech companies buying up RAM, and the RAM oligopoly situation restricting supply.

This will only serve to increase the power of big players who can afford higher component prices (and who, thanks to their oligopoly status, can effectively set the market price for everyone else), while individuals and smaller institutions are forced to either spend more or work with less computing resources.

The optimistic take is that this will force software vendors into shipping more efficient software, but I also agree with this pessimistic take, that companies that can afford inflated prices will take advantage of the situation to pull ahead of competitors who can’t afford tech at inflated prices.

I don’t know what we can do as normal people other than making do with the hardware we have and boycotting Big Tech, though I don’t know how effective the latter is.


> companies that can afford inflated prices will take advantage of the situation to pull ahead of competitors who can't afford tech at inflated tech

These big companies are competing with each other, and they're willing and able to spend much more for compute/RAM than we are.

> I don’t know what we can do as normal people other than making do with the hardware we have and boycotting Big Tech, though I don’t know how effective the latter is.

A few ideas:

* Use/develop/optimise local tooling

* Pool resources with friends/communities towards shared compute.

I hope prices drop sooner than projects dev tools all move to the cloud.

It's not all bad news: as tooling/builds move to the cloud, they'll become available to those that have thus far been unable or unwilling to afford a fast computer to be mostly idle.

This is a loss of autonomy for those who were able to afford such machines though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: