Edit: In case it wasn’t clear, I wasn’t seeking advice and I’m more than familiar with all the preventative measures that exist. The post is called “What to do if you kill the wrong file” not “check your backups”. There’s a plethora of information about the latter, even in this post, but virtually nothing on the former. This is the only edit made to the post. The only edits made were this addition of this clarification, and the addition of “without a backup” to the title.
Yep… it happened to me. I killed a docker compose file with 550 lines of God-forsaken yaml less than a week before the project launch, and the most recent backup we had was nearly a month old and would have taken at least a day to get back up to speed. With a stroke of luck, I handled it about as well as I could have for on the feet thinking and I’d like to share my experience and lessons learned for anyone else that may ever find themselves in these smelly shoes:
Disclaimer! I’m a young engineer still growing my expertise and experience. Some stuff in here may be bad advice or wrong, like my assertion that using dd to pull data off of an unmounted drive doesn’t risk data loss; I’m pretty damn sure of that, but I wouldn’t stake my life (or your data) on it. I’ll happily update this post as improvements are suggested.
IF YOU RM’D THE WRONG THING:
1. Stop all writes to that partition as quickly as possible.
this step has some optional improvements at the bottom
Up to this point I’d been keeping a lazy backup of the file deleted on another partition. In order to preserve the disk as well as possible and prevent overwriting the blocks with the lost file, I cd to the backup dir and run a docker compose down. There were a few stragglers, but docker stop $containerName worked fine.
2. Unmount the partition
The goal is to ensure nothing writes to this disk at all. This, in tandem with the fact that most data recovery tools require an unmounted disk, is a critical step in preserving all hopes of recovering your data. Get that disk off of the accessible filesystem.
3. Save what you have
Once your partition is unmounted, you can use dd or a similar tool to create a backup somewhere else without risking corruption of the data. You should restore to a different disk/partition if at all possible, but I know sometimes things aren’t possible and /boot can come in handy in an emergency. It would have been big enough to save me if I wasn’t working on a dedicated app-data partition.
4. Your sword of choice
It’s time to choose your data recovery tool. I tried both extundelete and testdisk/photorec, and extundelete got some stuff back but not what I was looking for, while also running into seg faults and other issues. Photorec, on the other hand, was truly a gift from the cosmos. It worked like a dream, it was quick and easy, and it saved my sanity and my project.
5. The search for gold
Use “grep -r ‘./restore/directory’ -e ‘term in your file’” to look through everything you’ve deleted on the part since the beginning of time for the file you need.
It was a scary time for me, and hopefully this playbook can help some of you recover from a really stupid, preventable mistake.
potential improvements
In hindsight, two things could have gone better here: 1. Quicker: I could have shut them down immediately if I was less panicked and remembered this little trick: docker stop $(docker ps -q) 2. Exporter running config: I could have used ‘docker inspect > /path/to/other/partition’ to aid in the restoration process if I ended up needing to reconstruct it by hand. I decided it was worth it to risk it for the biscuit, though, and choosing to shut the stack down as quickly as possible was worth the potential sacrifice.
If you fight to preserve a running config of some sorts, whether k8s docker or other, MAKE SURE YOU WRITE IT TO ANOTHER PARTITION. It’s generally wise to give an application it’s own data partition but hey, you don’t have a usable backup so if you don’t have a partition to spare consider using the /boot partition if you really want to save your running config.
If you’re considering a donation to FOSS, join me in sending a few bucks over to CGSecurity.
remove, recurse, force wrong path, there is no backup desperate panic
Git. Why you would even think to use anything else is…weird.
Data recovery is a complete shot in the dark in a situation like this.
If you commit often, you don’t have to worry about data loss, and git already has a workflow for this exact situation–git branches;
git checkout work_being_done // dozens and dozens of commits while working git rebase -i main git checkout main git merge work_being_done
Let’s you do any amount of work and save states for each step. You can even commit your working branch to the repository, so even if you have data loss like this, you can always just re-pull the repository.
Yeah I did a read through despite everything they wrote there’s still no mention git, which means their project-critical YAML file has no means to rollback changes, bisect issues, manage concurrent development, audit changes. or even back up via any git provider out there.
I can see they’re a new dev so I don’t wanna blame them, this is entirely on their project management and experienced devs to use some kind of version control.
I worked in a job which basically had me dragging-and-dropping files into a live production environment and I didn’t last more than 8 months before I scarpered for a job with better pay and better development practices.
This doesn’t sound like it’ll help you now, but I’m the future, you really should have cloud synced backups of that kind of thing.
I’m aware. The post was simply to get a recovery guide out there for a crappy situation.
Usually projects (especially large projects) are kept in a version control system like
git
. This is a prime reason why. With version control, it wouldn’t have mattered if you deleted the docker compose file, you could just bring it back. Also, usually every change has to go through version control, this way you always have a backup of the latest version of the file.100%. The organization wasn’t there yet and seeing that I wanted to remain employed at the time I wasn’t going to put up a fight against management 3 layers above me. Legacy business are a different beast when it comes to dumb stuff like that.
This is red flag shit. 3 layers of management trying to restrict version control on project critical code? You need to update your CV and start looking for a better role before they fuck up. I say this from experience.
This happened a while ago and I’m well past it. The point of the post was to help others that ended up in the situation, not sell best practices.
put up a fight against management 3 layers above me
Eh, yeah. I’ve been in that kind of situation before. Sucks.
Still, you should try to go rogue where you can. Not for the company, fuck the company, do it to protect yourself. Like, maybe you could create your own git repo and push the changes there yourself. Don’t tell anyone else, just do it privately. You don’t need to use GitHub, you could push to a local folder on your computer or a USB drive.
git init .
git doesn’t need Github, Gitlab, or even a server. It’s designed to allow devs to cooperate via patches and PRs sent by email.
I’m aware. Any local storage wouldn’t do much about a poorly aimed rm, though.
It’s really easy to configure a self-hosted forgejo instance. Even if you rm your local work, you can clone it from your server. Be that hosted on the same system over localhost, or on another system in your network.
I don’t know if it fits your use-case but a little known feature is to use a second local drive/folder as a remote, like this:
D: mkdir D:\git_repos\my_project.git git init --bare D:\git_repos\my_project.git C: cd C:\path\to\your\project git init git remote add origin file:///D:/git_repos/my_project.git
This way, you can now push to origin and it will send your commits to your repo on your second drive.
I’m aware, but thank you. This post was intended to be a guide for people that end up in this situation.
A lot harder to
rm
a whole directory vs a single file. And even then you cangit init --bare
a “remote” directory on the local machine that you push to to have a backup copy.
Not trying to victim blame but your org was kind of asking for it here. I hope someone above takes responsibility for the situation they put you in.
You can just “git init .” on your PC somewhere and color relevant stuff into it occasionally and commit. Might not be automated, might not be used directly in production (or on the prototype), but it at least exists.
Why it wasn’t in version control is the real issue here…
I’m not denying that stupid stuff didn’t happen nor that this wasn’t entirely preventable. There’s some practical reasons that are unique to large, slow moving orgs that explain why it wasn’t (yet) in version control.
100% my stack going forward. Thanks!
Photorec, on the other hand, was truly a gift from the cosmos
Can confirm. Over the years I’ve had recourse to this little tool several times and always found to to be almost disturbingly effective.
Disturbingly effective is definitely the right phrase. It’s actually inspired me to create a script on my desktop that moves folders to ~/Trash, then I have another script that /dev/random’s the files and then /dev/zeros them before deletion. It eliminated risk of an accidental rm, AND make sure that once something is gone, it is GONE.
Yep, I use
trash-put
, andtrash-empty
with a 30-day timeout. But no bit-scrubbing needed because the partition is encrypted.That’s the move.
While you can’t use Syncthing to share a git repo, it actually works quite well in an A -> B setup, where updates happen only on A and versioned backup is enabled on B. YMMV tho.
Once upon a time I stumbled on a tool called “fsfreeze”. Might be useful.
As others pointed out, version control is probably the best fix for this in addition to traditional backups. My goal in this post was to help others that have yet to learn responsibility save their ass and maybe learn their lesson in a less pleasant way.
Docker is annoying and unnecessary for a lot of the situations people use it in.
That said, do you think this would’ve happened if you were using a GUI instead of the command line?
I genuinely believe that we, as devs, need to move away from the command line whenever possible. I know it’s unpopular among our communities. I know there’s a snowball effect that has taken hold.
I still think issues like these would happen way less if people used software that was easy for them, not just the person making it.
My secret Linux shame is that however much I try, I just can’t understand Docker at all. Like I get the general idea of what it is, but I can’t visualise how it works if that makes sense.
I have an app that runs in Docker, that I installed by just following the instructions, but I don’t know where it is on my computer or what exactly it’s doing, which I don’t really like.
If I don’t have the file elsewhere, I restore from backup.