Benchmark results before:
num_operations: 1000000 insert : 191.090 s ( 5238.3/s) CPU; 191.106 s ( 5237.7/s) wall clock lookup : 30.430 s ( 33068.8/s) CPU; 30.428 s ( 33064.0/s) wall clock lookup_range: 85.700 s ( 11694.5/s) CPU; 85.700 s ( 11693.6/s) wall clock insert2 : 143.320 s ( 6986.7/s) CPU; 143.315 s ( 6986.6/s) wall clock remove : 246.010 s ( 4068.0/s) CPU; 246.013 s ( 4067.9/s) wall clock remove_range: 225.210 s ( 4444.0/s) CPU; 225.286 s ( 4442.4/s) wall clock
That's insert, lookup, and remove operations (and range operations) on a B-tree with one million keys. insert2 is insertions where keys are sorted. All nodes are stored in memory, not on disk.
After some micro-optimization (replacing my own flexible bsearch implementation with the Python standard library one):
num_operations: 1000000 insert : 97.850 s ( 10237.5/s) CPU; 97.860 s ( 10237.1/s) wall clock lookup : 12.430 s ( 81566.1/s) CPU; 12.429 s ( 81609.8/s) wall clock lookup_range: 52.670 s ( 19047.6/s) CPU; 52.669 s ( 19050.0/s) wall clock insert2 : 72.600 s ( 13806.4/s) CPU; 72.595 s ( 13808.6/s) wall clock remove : 134.510 s ( 7443.8/s) CPU; 134.512 s ( 7444.0/s) wall clock remove_range: 169.740 s ( 5897.3/s) CPU; 169.795 s ( 5895.6/s) wall clock
Not bad, for a train ride's work, with some additional improvements late at night in a hotel room.
Code is clearer too, now, at least mostly. And there's less of it: duplicating standard library functionality always maks me feel icky.
The numbers for removals look bad, really bad. Espcially range removal should be much better. If anyone wanted to get familiar with my btree code base, that would be an excellent place to start from. See the README for instructions.