I was travelling on a train today, and got a seat at a table, with a power socket. I decided to do some coding to pass the time, and picked the task of measuring the speed of summain, my tool for generating manifests of directory trees. Think md5sum or sha1sum, but printing out much of the inode information: the goal being that if I can diff manifests before a backup and after a restore and see that everything is OK.

So I wrote a little script to time any checksummer for a particular set of files, and got the following results:

113 ./summain
112 md5sum
114 sha1sum
121 sha512sum

It doesn't look like I need to actually optimize it at all: it's already as fast as md5sum and sha1sum, even though it's written in pure Python. (That's cause it uses the Python standard library module hashlib, which is, of course, pure C.)

Here's a sample output from summain, for the curious:

Mtime: 2011-02-13 09:59:09.000000 +0000
Mode: 100644
Ino: 1
Dev: 1
Nlink: 1
Size: 1256
Uid: 1000
Username: liw
Gid: 1000
Group: liw
Sha-1: b7fa5517ea20311d13128312456c0dba3da3b11d

Luckily I had some other things I could work on, so the rest of the train ride wasn't too boring.