> The only feature from ZFS that I would like, is the corruption detection.
I run ZFS on my main server at home (Proxmox: a Linux hypervisor based on Debian and Proxmox ships with ZFS) but...
No matter the FS, for "big" files that aren't supposed to change, I append a (partial) cryptographic checksum to the filename. For example:
20240238-familyTripBari.mp4 becomes 20240238-familyTripBari-b3-8d77e2419a36.mp4 where "-b3-" indicates the type of cryptographic hash ("b3" for Blake3 in my case for it's very fast) and 8d77e2419a36 is the first x hexdigits of the cryptographic hash.
I play the video file (or whatever file it is) after I added the checksum: I know it's good.
I do that for movies, pictures, rips of my audio CDs (although these ones are matched with a "perfect rips" online database too), etc. Basically with everything that isn't supposed to change and that I want to keep.
I then have a shell script (which I run on several machines) that uses random sampling where I pick the percentage of files that have such a cryptographic checksum in their filenames that I want to check and that verifies that each still has its checksum matching. I don't verify 100% of the files all the time. Typically I'll verify, say, 3% of my files, randomly, daily.
Does it help? Well sure yup. For whatever reason one file was corrupt on one of my system: it's not too clear why for the file had the correct size but somehow a bit had flipped. During some sync probably. And my script caught it.
The nice thing is I can copy such files on actual backups: DVDs or BluRays or cloud or whatever. The checksum is part of the filename, so I know if my file changed or not no matter the OS / backup medium / cloud or local storage / etc.
The checksum doesn’t help you fix the flipped bit nor does it tell you which bit flipped. You would have to re-create from a complete back up instead of using the efficiency of parity discs. Basically Raid 1 vs Raid 5
If you already have some data on ext4 disk(s) and don't want to deal with the the issues of using ZFS/BTRFS then it's a no brainer. Dynamically resizing the "array" is super simple and it works really well with MergerFS.
If OP is backing up locally onto a ZFS server like they said they were then say propagating this data to a cloud provider like Blackblaze which uses ext4 this sort of approach makes sense.
This approach is also good when you have multiple sources to restore from. It makes it easier to determine what is the new "source of truth."
Theres something to be said too for backing up onto different FS too. You don't want to be stung by a FS bug and if you do then it's good to know about it.
I wonder how hard would it be to detect which single bit was flipped? As ryao noted, in JPEGs it's immediately obvious where the image was corrupted - by visual inspection. Similar for videos, you only need to inspect the data following a single I-frame. Even for bitmap/text files, you could just scan the entire file, try flipping one bit at a time, and compare the result with the checksum.
Unlike e.g. KDFs, checksums are built to be performant, so that verifying one is a relatively fast operation. The Blake family is about 8 cycles per byte[1], I guess a modern CPU could do [napkin math] some 500-1000 MB per second? Perhaps I'm off by an order of magnitude or two, but if the file in question is precious enough, maybe that's worth a shot?
I run ZFS on my main server at home (Proxmox: a Linux hypervisor based on Debian and Proxmox ships with ZFS) but...
No matter the FS, for "big" files that aren't supposed to change, I append a (partial) cryptographic checksum to the filename. For example:
20240238-familyTripBari.mp4 becomes 20240238-familyTripBari-b3-8d77e2419a36.mp4 where "-b3-" indicates the type of cryptographic hash ("b3" for Blake3 in my case for it's very fast) and 8d77e2419a36 is the first x hexdigits of the cryptographic hash.
I play the video file (or whatever file it is) after I added the checksum: I know it's good.
I do that for movies, pictures, rips of my audio CDs (although these ones are matched with a "perfect rips" online database too), etc. Basically with everything that isn't supposed to change and that I want to keep.
I then have a shell script (which I run on several machines) that uses random sampling where I pick the percentage of files that have such a cryptographic checksum in their filenames that I want to check and that verifies that each still has its checksum matching. I don't verify 100% of the files all the time. Typically I'll verify, say, 3% of my files, randomly, daily.
Does it help? Well sure yup. For whatever reason one file was corrupt on one of my system: it's not too clear why for the file had the correct size but somehow a bit had flipped. During some sync probably. And my script caught it.
The nice thing is I can copy such files on actual backups: DVDs or BluRays or cloud or whatever. The checksum is part of the filename, so I know if my file changed or not no matter the OS / backup medium / cloud or local storage / etc.
If you have "bit flip anxiety", it helps ; )