Would you recommend building something like this for a much smaller system? 10Ti...

jspiros · on Sept 1, 2014

By "something like this", do you mean ZFS? I am a HUGE fan of ZFS, and I do think that it's worth using in any situation where data integrity is a high priority.

As far as ZFS on Linux, it still has its wrinkles. I use it because, like you, I'm comfortable with Debian, and I didn't want to maintain a foreign system just for my data storage, and I still wanted to use the machine for other things too. (I actually started with zfs-fuse, before ZFS on Linux was an option.)

So, I don't know. If you just want a box to store stuff on, you might want to just look into FreeNAS, which is a FreeBSD distribution that makes it very easy to set up a storage appliance based on ZFS. FreeBSD's ZFS implementation is generally considered production-ready, so you avoid some ZFS on Linux wrinkles, too.

So, I'd recommend checking out the FreeNAS website, and maybe also http://www.reddit.com/r/datahoarder/ for ideas/other opinions. I do a lot of things in weird idiosyncratic ways, so I'm not sure I'd recommend anyone do it exactly how I have. :)

laumars · on Sept 1, 2014

If you're comfortable with Debian then you shouldn't have too many issues with FreeBSD as there is a lot of transferable knowledge between the two (FreeBSD even supports a lot of GNU flags which most other UNIXes don't).

Plus FreeBSD has a lot of good documentation (and the forums have proven a good resource in the past too) - so you're never going it alone (and obviously you have the usual mailing groups and IRC channels on Freenode).

While I do run quite a few Debian (amongst other Linux) I honestly find my FreeBSD server to be the most enjoyable / least painful platform to administrate. Obviously that's just personal preference, but I would definitely recommend trying FreeBSD to anyone considering ZFS.

jspiros · on Sept 1, 2014

As far as I'm concerned, the most identifiable characteristic of Debian is the packaging system, dpkg/apt. I've used FreeBSD occasionally, and that's what I always end up missing about Debian. I did consider going with Nexenta or Debian GNU/kFreeBSD, but whatever, ZoL works well enough. :)

laumars · on Sept 1, 2014

FreeBSD 10 has switched to a new package manager, so it might be worth giving it another look next time you're bored and fancy trying something new.

I can understand your preference though. I'm not a fan of apt much personally, but pacman is one of the reasons I've stuck with ArchLinux over the years - despite it's faults :)

jspiros · on Sept 1, 2014

I'll keep that in mind; I do sometimes find myself with some time to play with things. :)

flexd · on Sept 1, 2014

By 'something like this' I meant pretty much what you just said: Would you do it the same way (your own everything) if you needed a much smaller system, or would you go with something like FreeNAS, like you suggested? I am confident I c an get it working good either way, but I would rather not spend half my days having to tweak and worry about stuff working correctly. I understand that it will need maintenance and monitoring of course, but I would much rather be more of a end-user having a working system than being the sysadmin that has to fix it all the time. :-)

Thanks for the link, I will take a look there.

jspiros · on Sept 1, 2014

Well, if you don't get a kick out of "tweaking and worrying", yes, I definitely recommend FreeNAS. Although I'm confident in my system now, it took a long time to get this way, and I could've saved hundreds of hours by just going with something like FreeNAS (had it existed); I stuck with it because I kinda enjoy doing things the hard way.

flexd · on Sept 2, 2014

I do kind of get a kick out of that, but at the same time I also just want a safe system for storing data. If I end up building something like this I will take a look at FreeNAS! Thanks!

maaku · on Sept 1, 2014

I have a similar setup with 12TB capacity. ext4 over mdadm RAID-6 w/ 2 spare drives. It's specifically setup such that any single failure (including SATA expansion card) can't bring down the pool. It's been stable for ~2 years, and it's really nice to have that much storage in the house.

You don't need ZFS for this, as cool as it is.

jspiros · on Sept 1, 2014

ZFS still protects you from bitrot when compared to ext4 over mdraid. When you get to many terabytes of data, it's almost guaranteed that you're going to lose something to bitrot. In my case, my most recent scrub detected and repaired 1.58MB of bitrot. And in any given month, `zpool status` will show one or two checksum errors as having been corrected in real-time, as I was working with the corresponding files directly.

This is probably the number one thing that excites people about ZFS over any other solution, and it's something that isn't really easily implemented on a standard RAID + standard filesystem arrangement, since this sort of functionality depends on the filesystem knowing about the underlying disk arrangement.

"ZFS uses its end-to-end checksums to detect and correct silent data corruption. If a disk returns bad data transiently, ZFS will detect it and retry the read. If the disk is part of a mirror or RAID-Z group, ZFS will both detect and correct the error: it will use the checksum to determine which copy is correct, provide good data to the application, and repair the damaged copy."

https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data

maaku · on Sept 1, 2014

How is that any different from standard RAID? That's exactly the problem RAID was created to solve...

csirac2 · on Sept 1, 2014

ZFS & btrfs detect and fix silent corruption - where no errors are emitted from the hardware.

I think the pertinent question is: when the filesystem goes to read a 4K block, and one drive's copy of this block in the RAID-1 set is different to its counterpart 4K block on another disk, which one wins?

maaku · on Sept 1, 2014

I didn't specify RAID-1. RAID-5 or RAID-6 can reconstruct the correct value in a silent fail.

Honest question: how often to drives silently fail? Drives contain per-sector checksums these days, explicitly to prevent this problem.

radiowave · on Sept 1, 2014

I don't know how often they fail, but I will say that the failures that I hope I never see again (or at least, I hope I never see again outside of a ZFS system) are not drive failures, but those involving intermittent disk controller or backplane faults. In comparison to the chaos I've seen this cause on NTFS systems, ZFS copes astonishingly well.

justincormack · on Sept 1, 2014

A normal raid does not check the checksums on read. It only uses them after a device failure.

Also it may have copies of the data eg raid 1 but does not know which is correct if they differ.

maaku · on Sept 1, 2014

No, every default mdadm install performs a complete scrub on the first Sunday of the month. Every block of the array is read back and validated. For RAID modes with parity (e.g. RAID-5, RAID-6) it is able to detect and fix the offending disk when a silent error occurs. You can trigger such a scrub whenever you want (I run mine once a week).

dsturnbull2049 · on Sept 2, 2014

Scrubbing the entire raid volume is significantly different from scrubbing every piece of data as it gets written/read.

First, in between your monthly/weekly scrubs your disks/controllers will be silently corrupting data, possibly generating errors on multiple devices resulting in data loss depending on raid type. ZFS detects corruption much more quickly.

Second, your traditional raid recovery is to rewriting the entire device to fix a single block. Let's say you're using RAID5 and you're rewriting parity. You get another block error. Oops, now you've lost everything. Since disks have an uncorrectible block error rate of 1 in 10^15 bits, you only need a moderately sized array to almost guarantee data loss. ZFS rewrites corrupt data on the fly.

maaku · on Sept 2, 2014

Every time you read or write from a RAID volume, it does perform validation and write-back on error detection. I think your mental model of how linux software RAID works needs updating.

I'm not trying to argue that mdadm is better than ZFS, just that in this case they pretty much compare the same.

justincormack · on Sept 2, 2014

If the _drive_ reports a read error it will. If there is silent data corruption it wont. You can test this by using dd to corrupt the underlying data on a drive.

maaku · on Sept 3, 2014

Hrm. I'm going to test that.

erkkie · on Sept 1, 2014

That's interesting, I've been running a 6x3TB raidz2 for a year or so on wd reds and no bitrot so far, no checksum errors either, regular scrubs.

jspiros · on Sept 1, 2014

Almost all of the bitrot I see is on the oldest vdevs, which at this point probably contain mostly only old snapshots that are almost never accessed. My oldest vdevs are... 4-5 years old.