Some weeks ago I noticed that one of my backup servers was not removing the old backup generations it was making. It had over 18 months worth of daily backups, done with rsync and “cp -al”. This is not the world’s greatest backup system, but that’s entirely irrelevant for this rant.

The cron job that would do that did not exist: I had forgotten to enable it.

So I ran it by hand. Then fixed it, when I noticed it was buggy, and didn’t remove any old generations prior to 2010. After that I removed years-old backups of machines that haven’t existed for years.

Disk usage:

       Size  Used Avail Use% 
before 5.4T  2.6T  2.6T  51% 
after  5.4T  892G  4.3T  18% 


          Inodes    IUsed     IFree IUse%
before 732577792 77645915 654931877   11%
after  732577792 16871789 715706003    3%

That’s about 61 million files removed, over the course of three weeks, of which the machine was down due to crashes for about one week.

It should not take two weeks to copy a couple of terabytes (it’s less than two megabytes per second). It should definitely not take two weeks to delete it. Almost all the time it was up, it was removing files. There were a couple of times when it finished one run and was idle until I woke up and started the next run. Say a day in total.

Why is “sudo rm -rf” on a large directory tree, or set of large directory trees, so slow? I suspect part of the problem is that the system call API is beautiful, but simplistic: you need to call unlink or rmdir on every file separately, and that means doing an inode lookup separately for each file. Having a system call to delete everything in a directory would have a chance of removing things more efficiently, I suspect. However, since I’ve not done any profiling, nor even analyzed the relevant source code in the kernel, I might be entirely wrong.

I wonder if anyone else needs to remove large directory trees frequently? I do that a lot, because of my backup software development hobby. But I may be in a unique position, and nobody else would benefit from fast removal.

(I’ll note that my backup server is using ext3, and that ext4 is much faster at removing files, particularly large files, but it’s still not fast enough.)