I've been learning Rust lately. As part of that, I rewrote my summain program from Python to Rust (see summainrs). It's not quite a 1:1 rewrite: the Python version outputs RFC822-style records, the Rust one uses YAML. The Rust version is my first attempt at using multithreading, something I never added to the Python version.

Results:

  • Input is a directory tree with 8.9 gigabytes of data in 9650 files and directories.
  • Each file gets stat'd, and regular files get SHA256 computed.
  • Run on a Thinkpad X220 laptop with a rotating hard disk. Two CPU cores, 4 hyperthreads. Mostly idle, but desktop-py things running in the background. (Not a very systematic benchmark.)
  • Python version: 123 seconds wall clock time, 54 seconds user, 6 second system time.
  • Rust version: 61 seconds wall clock (50% of the speed), 56 seconds user (104%), and 4 seconds system time (67&).

A nice speed improvement, I think. Especially, since the difference between the single and multithreaded version of the Rust program is four characters (par_iter instead of iter in the process_chunk function).