Hi,
reading here and there I’m getting scared and scared about bit rot or similar problems (firmware error? ssd nand ruined?): the disks seems to work fine, but data might be corrupted and so (my) easily backup strategy - that includes two rsynced copies in different places and time, one on ssd and another on hdd - cannot be enough. Please note that all my personal data are on ext4 filesystems and they are less than 1 TB (ok, it’s not a datahoarding size bu this is a sub where theme-related experts are). Maybe the probability is low, and the probability that a critical file is impacted is lowest, but you know Murphy? I do.
Now, the gold solution should be to replace all of my physical servers with others that support ECC ram; then I’ll have to buy at least 3 CMR-disks for building a ZFS raid or a btrfs similar one. Actually this solution is not sustainable because of time, space and cost: so I have to accept the risk to a second best solution… but which? I also would like to avoid the use of other (just optical) media type.
For example, using a backup tool - restic/kopia or proxmox backup server - might riduce the risk? I say so because of an incremental approach might allow me to restore data at selected point in the past. Of course, I have no way to find that point in the past and, moreover, i will lost all data produced after the time point. Maybe I could apply this strategy just to a subset of very critical and immutable data (official documents)? Or, for these documents, I could just use rsync with the checksum option?
As usual, thanks for any suggestion!
Checksum your files, as you say. Just md5sum or sfv tools will do this and are quick and easy to check at any time.
If you want a way to recover as well, create a certain % of parity files with a tool like par2. If you’re just worried about the occasional flipped bit, a very small % of parity will go a really long way. Creating par2 is NOT super fast but you only have to do it once.
This is basically file-level RAID.
Interesting, I will go deep. I think I should add or reformulate the question or I am missing a thing: I first have to avoid - or check - that source files are not bitrotten, otherwise all backups will be have bitrotten data, right? I mean, I might checksum but on working data how can I have evidence of it? And on “cold” data - f.e., pictures - how can be sure that source ruined files are not copied onto the backups? I cant - because of time - checksum 1tb of data every night…