While we're discussing making Tarsnap more user-friendly:
The single reason (other than, er, fecklessness on my part) why I haven't tried Tarsnap is that it seems difficult to predict how much using Tarsnap would actually cost me. I would need to know (1) how much space my data will take up after compression and deduplication, and (2) how much bandwidth my incremental backups will need -- again, after compression and deduplication.
Wouldn't It Be Nice If there were a "predict my Tarsnap costs" tool? You download it and point it at your data. It compresses it and identifies duplicated blocks, and says "You would initially be storing about 25 GB of data on the Tarsnap servers, which will cost you about $6.70 per month." It records a bunch of hashes. Then, a little later, you run it again. It identifies changed blocks and compresses the differences (or something), and says "If you do an incremental update like this, you will transfer about 2GB of data, which will cost you about $0.54." With, each time, a disclaimer: "This is only a crude estimate and if you think Tarsnap, Inc., will be in any way bound by it then you're out of your mind."
I'm not sure whether this is a thing that any random third-party person could write, or whether getting the numbers right would require information about exactly what happens on the Tarsnap servers that only Colin has. Perhaps the answer is that doing it right requires secret information, but doing it well enough (e.g., getting within 20%, 95% of the time) is easy with some naive algorithm like "divide everything into 4kB blocks, hash them all, identify duplicates, compress individual unique blocks with gzip; do the same for incremental updates but ignore blocks whose hash hasn't changed".
I'm curious: is this a factor stopping other people signing up with Tarsnap? Is it perhaps only a factor for cheapskate individuals like me, and not for the larger organizations that probably represent most of Tarsnap's profits?
how much space my data will take up after compression and deduplication
Valid point. You can use tarsnap's --dry-run option (along with --print-stats) to find out how much space your data will take up; unfortunately you need to have a key file before you can run that, and you can't create a key file until you've created a tarsnap account and added money to it.
I will be adding a mechanism to allow people to run "keyless" dry runs so that you can install tarsnap and find out how much space your data would take before you spend any money.
Would I be right in thinking that, if the storage cost is small enough not to bother me, then the bandwidth costs are extremely unlikely to be a problem?
The bandwidth used to upload data is going to be around 1% more than the size of the data (protocol overhead). So to a close approximation the cost of the bandwidth for uploading is going to be the same as the cost of a month of storage.
Yes. (Although even with incrementals, until you start deleting data your total bandwidth usage to date will be roughly the same as your current monthly storage.)
The deltas were completely made up. If I knew they were right, I'd sign right up today. (I think.) What I don't want to do is guess at the deltas, sign up for Tarsnap, start up my first backup or my first incremental thereafter, and find I'm on the hook for hugely more than I originally budgeted.
I think the way it actually works, what would happen is that after my first backup completes my account would go overdrawn and I would lose access to Tarsnap. And then I could walk away and all I'd actually lose would be whatever I'd prepaid. But -- and this is surely irrational, but I bet it isn't only my brain that works this way -- I would then feel really bad because (1) I would be failing to pay for something I had bought and (2) I would have spent money and got nothing in return. Even though (1) Colin surely budgets for a certain amount of attrition and (2) the sum of money involved needn't be bigger than $5.
I don't want to do something that seems like it has a substantial chance of leaving me feeling simultaneously guilty and wasteful.
I did the math and I see your point. I'm also a CrashPlan user.
Right now I'm paying CrashPlan about $60/year. My best guess at Tarsnap costs is about 10x that, $600/year.
Normally I'm a very privacy-conscious person, but that difference does give me pause. I could probably slim down what I do with Crashplan to only personal files rather than the whole disk, but why would I want to do that? The whole point is to "fire and forget", and restore painlessly.
Perhaps Tarsnap makes sense if you're Stripe, with lots of sensitive data to store, easily partitioned from your other data. But if my calculations are correct, probably not if you're an average individual.
Does CrashPlan dedup and compress the data? Also I have to say, tarsnap is so much more lightweight than CrashPlan. If they came out with a nice command line client I would have to reconsider them.
When I last tried to do a calculation, I came out at well over $100 per month which, yes, I can afford, but is way more expensive than what I'm paying Crashplan. There are many reasons I'd rather use Tarsnap than Crashplan, but it's hard to justify that price difference.
This might be one of those situations where the simplest and cheapest way to figure out if something will work for you is to just try it for a bit. You can always stop, but right now you're at risk of data loss. Perfect is the enemy of good, worse is better, etc.
Consider: Your current approach is basically to model the problem, evaluate the model, and then decide whether to use the service. That approach seems to protect you from risk, but it really doesn't. As you've noted, there isn't an existing model, which makes the modeling-first approach too expensive. As a result, you don't have backups, which is riskier that one or two months of abnormally expensive tarsnap service.
Your deciding factor is probably the long-term cost. You can ballpark that by paying only a month or two and extrapolating. That should relatively cheap in absolute terms. Most numbers are not outliers, so you're unlikely to be very surprised. You would probably know already if you're atypical in some way. Even if it turns out you'd have an abnormally high cost, figuring that out by evaluating the service will be cheaper that building a model.
Perhaps you're worried that the costs are fine right now but will grow too high over time. If so, that'll happen relatively slowly, over the course of a month or two at the fastest. At worst, you'll have to delete some backups or switch to a difference service. That might be annoying, but it won't be overly expensive and it's unlikely to even happen. If it does, at least you will have had affordable backups in the meantime. Right now, you're at risk.
Are you sure something like Backblaze isn't a better fit for you, then?
"Unlimited" backups at $5/month/computer. Install their proprietary agent on your Windows/Mac OS X machine, log in, and forget about it. No Linux support, though.
> (1) how much space my data will take up after compression and deduplication
In my experience, deduplication typically doesn't do much for the first backup. Further backups take approximately as much space as usual incremental backups. So the answer to this is "as much space as your existing backup".
> (2) how much bandwidth my incremental backups will need
I recently looked into this and for me the cost of bandwidth used for daily backups is about 10-15% of the cost of data storage.
Just by way of information for Colin (though I'm not sure how much information it really gives): this comment is currently on +25, which to my mind suggests that there are a reasonable number of others who would also like this feature.
I think the ability to do a dry-run upload without prepaying (which Colin has already said is on the way) will be pretty much a complete solution.
Yeah, I'd much prefer a $10/month for 10 GB of storage plan without any variable pricing.
But honestly, it is what truly stops me from using Tarsnap. The part that stops me is the fact I have a dev server at home I have on 24/7 and I can just wait 9-10 hours for it to pull gpg'd backups.
The rest will come once this has had more testing. This is a first public beta and the move of code from private development to public development; it's not ready yet for people who would be scared off by needing to compile it.
And I need to scroll horizontally to read the whole page. Though I guess it's easy to copy/paste into a text editor and read it there since the text is so plain.
I once asked Colin Percival about doing an OS X front end as a commercial product, to be sold independently, because tarnsap is awesome and the user experience isn't.
Nothing came of it because he is sticking to his licensing, which would mean telling users to first install XCode before installing the GUI. It didn't seem like that would be a winner, so I moved on to other things.
Couldn't it be done in the age-old tradition of UNIX/Linux GUI frontends to command line apps? It seems like your installer (or the GUI app itself, on first run) could do all the work of downloading the Tarsnap source, compiling it and then simply launch the CLI program to execute the actions.
It'd take some dev work to set it all up, but could be transparent for the user.
On OS X that requires downloading a 2 GB package (Xcode) first to install the compiler and such. Users would rightly think that's overkill for a backup tool.
Yes. Also, installing the command line tools on a pristine system is easy and does not require downloading Xcode. Typing cc in Terminal will pop up a dialog box that will offer to install the CLT.
I don't get what you're saying. Why would you lose verifiability or security? And the hassle would be only for the developer, not the user.
Oh, the installer would obviously check the source archive against a bundled copy of the current Tarsnap GPG key before compiling. I just wasn't detailing implementation details.
cperciva announced that binary packages are coming in the future in the thread that announced the GUI[0]:
Speaking of broadening Tarsnap's user base: Binary packages
(for both tarsnap and this GUI) will happen at some point --
not that I recommend relying on binaries (since you lose the
ability to audit the code), but in keeping with the UNIX/X11
philosophy of "tools, not policy" I want to allow users to
decide the tradeoff between paranoia and ease of use for
themselves.
I bring fresh news from the Tarsnap pit.
I have been working hard for the last 6 months on a desktop application
frontend for the awesome Tarsnap service. Most of you are using Tarsnap as
it was designed, from the command line, usually on the server side and in
scripts, however some people, like me, feel the need to benefit from the same
Tarsnap juice from the comfort of the desktop too, with ease, for common tasks
and swift backups. Another important aspect is that it is so easy to create
complex and custom backup schemes using the tarsnap command line utilities,
that are adhering to the Unix philosophy and thus can easily be used like an
API, that I genuinely found an opportunity for creating a backup application
that I would be the first user of and would put my lack of patience, trust
and overall pessimism regarding existing solutions at rest.
This is where I introduce Tarsnap for the desktop, a cross-platform,
open source (BSD 2 clause) modern desktop application acting as a wrapper
around the Tarsnap command line utilities, written in C++ and using the
Qt 5 framework. You need to install the command line Tarsnap client before
you can use the application. Given that Tarsnap doesn't provide any binary
redistributables for the CLI utilities on any platform at the moment,
there's none for this desktop app either. This might be subject to change
in the future.
The project page and code is hosted at your favorite host, Github:
https://github.com/Tarsnap/tarsnap-gui
To get started all you have to do is: $git clone
https://github.com/Tarsnap/tarsnap-gui.git && cd tarsnap-gui $git
checkout v0.5 $less README
The application currently has 3 main usage patterns:
1. The Backup tab allows you to quickly backup files and directories in a
single shot fashion; 2. The Archives tab lists all of the archives that have
been created using the current machine key. You can inspect, restore and
delete archives from this view; 3. The Jobs tab. A job is a predefined set
of directories and files, as well as backup preferences, that you know are
going to be backed up regularly; These are persistent (in a local Sqlite DB)
and you can attend to them whenever you wish afterwards;
The other tabs are Settings and Help, which hopefully are self explanatory. See
the distribution files README, CHANGELOG and COPYING for information on
respective matters.
The current version is 0.5 and is considered beta until otherwise
announced. There are rough edges around the corners and lots more ground to
cover when it comes to functionality and Tarsnap options breadth and depth
coverage. All development will now take place in the open, thus I'd like to
start the conversation here and encourage contribution and review on GitHub.
Read this Wiki page and the announcement summary on my
blog for more details and some beautiful screenshots:
https://github.com/Tarsnap/tarsnap-gui/wiki/Tarsnap
http://shinnok.com/rants/2015/06/11/tarsnap-frontend-released/
To conclude, if you're a lazy desktop user like me, you're a perfect fit,
be an early adopter and start using it now so we can get it further. :-)
I do advise you to create a new key for this desktop session, it's generally
best practice and a safe-guard given that the application is still in beta.
Cheers, Shinnok
It's not directly related to the gui but I see the tarsnap service for the first time. The last release of tarsnap was two years ago. Is it so secure that there were not even bug fixes necessary since then?
Apart from that the service with the gui looks quite promising to me and is totally worth a try.
I've been spending most of my time for the past two years managing the server side of things, but there will be a new release soon and hopefully more active client development after that. Part of the reason this GUI is on github is that I'm in the process of moving my local client code tree up there (aka. going through the fixes I've made over the past two years and filling in commit messages).
Where are you getting this from? The latest release on the website is from February 2015, as near as I can tell. Or at least, that's when it was signed.
There's a mention of scheduling backups in the Github repo, as an upcoming feature, but in the meantime, here's my tarsnap-cron: https://github.com/pronoiac/tarsnap-cron
There have been a few attempts at GUIs over the years. I'm hoping that this one will turn out well; I think it's already more functional than any of the others, and it's being actively developed.
All in due time. This is a first public beta; I'd like it to get some testing from the community (including feedback on use cases) before I promote it too much to new users.
Did anyone else get a blocked notice in their corporate bluecoat filter?
Bluecoat blocks the strangest things sometimes but this was amusing to me because I work at a "cloud hosting provider" and essentially tarsnap could be seen as a competitor to some services that we sell, but we don't focus solely on *nix systems.
The single reason (other than, er, fecklessness on my part) why I haven't tried Tarsnap is that it seems difficult to predict how much using Tarsnap would actually cost me. I would need to know (1) how much space my data will take up after compression and deduplication, and (2) how much bandwidth my incremental backups will need -- again, after compression and deduplication.
Wouldn't It Be Nice If there were a "predict my Tarsnap costs" tool? You download it and point it at your data. It compresses it and identifies duplicated blocks, and says "You would initially be storing about 25 GB of data on the Tarsnap servers, which will cost you about $6.70 per month." It records a bunch of hashes. Then, a little later, you run it again. It identifies changed blocks and compresses the differences (or something), and says "If you do an incremental update like this, you will transfer about 2GB of data, which will cost you about $0.54." With, each time, a disclaimer: "This is only a crude estimate and if you think Tarsnap, Inc., will be in any way bound by it then you're out of your mind."
I'm not sure whether this is a thing that any random third-party person could write, or whether getting the numbers right would require information about exactly what happens on the Tarsnap servers that only Colin has. Perhaps the answer is that doing it right requires secret information, but doing it well enough (e.g., getting within 20%, 95% of the time) is easy with some naive algorithm like "divide everything into 4kB blocks, hash them all, identify duplicates, compress individual unique blocks with gzip; do the same for incremental updates but ignore blocks whose hash hasn't changed".
I'm curious: is this a factor stopping other people signing up with Tarsnap? Is it perhaps only a factor for cheapskate individuals like me, and not for the larger organizations that probably represent most of Tarsnap's profits?