> This seems like a lot of zesty made-up assumptions.
Nope.
The second half of my post, anyone who has been seriously involved with large carrier-neutral facilities will likely agree with me.
It is a fact that IA will be incurring a premium to DIY and as I quite clearly spelt out, I am NOT trying to say they are wrong, I am just genuinely curious as to what the premium they are paying is.
Regarding my comment about large non-profits. This is from personal experience. Once they get to a certain size, non-profits do switch to a business mentality. You might not like that fact, but it is a fact. They will more often than not have management boards who are "competitively remunerated". They will almost always actively manage their spare cash (of which they will have a large surplus) in investment portfolios. Things will be budgeted and cost-centered just like in larger businesses. They will have in-house legal teams or external teams on retainer to write up philanthropic contracts and aggressively chase after donations people leave them in wills. etc. etc. etc. etc.
You absolutely cannot place a large non-profit in the same mindset as your local community mom & pop non-profit that operates hand to mouth on a shoestring.
That is why I discourage people donating to large non-profits. You might feel good donating $100. But in reality its a sum that wouldn't even be a rounding-error on their financial reports. And in the majority of cases most of your donation is more likely to contribute to management expenses than the actual cause.
Large non-profits are more interested in large corporate philanthropic donations, preferably multi-year agreements. They have more than enough money for the immediate future (<=12–18 months), they want large chunks of future money in the pipeline and that's what the large philanthropic agreements give them.
Of course they are. Had to block anything at work coming from one certain company because it wasn't respecting robots.txt and the bill was just getting silly.
We absolutely lap them with many, many more petabytes of material. But archive.today is also not doing speculative or multiple scheduled captures of the amount of sites that archive.org is.
An article about "infrastructure" that opens up with a dramatic description of a datacenter stuffed into an old church, I would expect more than just generic clipart you'd see in the back half of Wired magazine.
That's super cool!
Can the IA building be accessed by some random people like myself? Next time I'm in SF (who knows when that will be though) I'd very much like visiting it!
There was a lot of renovation. One day they fired up the pipe organ (which still works) inside the building as well as the servers and the transformer for the street blew up. That was a legendary day.
No regular residential building is set up to host a datacenter off the bat. Even racking more than half a dozen boxes in a given room requires an upgrade.
Most rooms in North America won't be wired for anything over 2.5 kW by default (kitchens and laundry rooms being obvious exceptions).
An electric dryer might pull 5 kW. An electric range ballpark 10 kW. Versus 15 kW per full rack for a fairly tame setup.
And then you've got the problem of dissipating all that heat.
That's sad, but it mirrors my experience with commercial customers. Tape is so fiddly but the cost efficiency for large amounts of data and at-rest stability is so good. Tape is caught in a spiral of decreasing market share so industry has no incentive to optimize it.
Edit: Then again, I recently heard a podcast that talked about the relatively good at-rest stability of SATA hard disk drives stored outdoors. >smile<
Tape is also an extraordinarily poor option for a service like Internet Archive which intends to provide interactive, on-demand access to its holdings.
Back in the day, if you loaded a page from the web archive that wasn’t in cache, it’d tell you to come back in a couple of minutes. If it was in cache, it was reasonably speedy.
Cache in this case was the hard drives. If I recall correctly, we were using SAM-FS, which worked fairly well for the purpose even though it was slow as dirt —- we could effectively mount the tape drive on Solaris servers, and access the file system transparently.
Things have gotten better. I’m not sure if there were better affordable options in the late 1990s, though. I went from Alexa/IA to AltaVista, which solved the problem of storing web crawl data by being owned by DEC and installing dozens of refrigerator sized Alpha servers. Not an option open to Alexa/IA.
This is a common use for tape, which can via tools like HPSS have a couple petabytes of disk in front of it, and present the whole archive in a single POSIX filesystem namespace, handling data migration transparently and making sure hot data is kept on low-latency storage.
Perhaps? But unless tape, and the infrastructure to support it, is dramatically cheaper than disk, they might still be better served by more disk - having two or more copies of data on disk means that both of them can service load, whereas a tape backup is only passively useful as a backup.
unless tape, and the infrastructure to support it, is dramatically cheaper than disk,
This turns out to be the case, with the cost difference growing as the archive size scales. Once you hit petascale, it's not even close. However, most large-scale tape deployments also have disk involved, so it's usually not one or the other.
You might squirm at using refurbished or used media but those 3TB SAS ex-enterprise disks are often the same price or cheaper than tapes themselves (excluding tape drive costs!). Will magnetic storage last 30 years? Probably not but they don't instantly demagnetize either. Both tape and offline magnetic platters benefit from ideal storage conditions.
It's not just cost / media, though. Automated handling is a big advantage, too. At the scale where tape makes sense (north of 400TB in retention) I think the inconvenience of handling disks with similar aggregate capacity would be significant.
I guess slotting disks into a storage shelf is similar to loading a tape changer robot. I can't imagine the backplane slots on a disk array being rated at a significant lifetime number of insertions / removals.
If you're ok with individual storage units as small as 3TB, then we're talking about a different set of needs. At that scale, whatever you can lay hands on is probably fine. Used tape is also cheaper than new. IA is dealing with petascale, which is why I mentioned that the price difference widens with scale.
Tape is almost always used for cold storage backups that are offline in case of ransomware attacks. Using it for on demand access would be insanely slow
You got to be 30% correct with Internet Archive criticism and enjoy unfettered, sometimes problematic commentary with little pushback.
Maybe you should take what your version of the W is.
reply