During my recent vacations, I had some time to work on Obnam, my backup program. I've been rewriting it, and had gotten it to work again, but it was slow. Profiling showed the problem to be in the in-memory LRU cache for "backup objects". It's not relevant what backup objects are, except that they have an identifier field. During a backup run, these objects need to be cached in memory, for speed, but not too many of them, so memory requirement don't blow up.

My first version of the LRU cache was basically just a list, in which the most recently used object was kept at the end. Every time I accessed an object in the cache, it was moved to the end. Turns out, linear search is slow when you do it millions of times and the list is tens of thousands long. Who would've guessed.

I set out to write a better one. The API of the cache class is basically this:

class ObjectCache(object):
    max = N # max number of objects in cache
    def get(self, objid): 
        """Return corresponding object, or None."""
    def put(self, obj):
        """Add object, maybe forget least recently used one."""

Both get and put need to be fast, preferably on the order of O(1), but certainly much faster than O(max).

My solution is based on the realization that Python's built-in hash table (dictionary) is pretty fast, so I might as well use two.

First, age is indicated by an integer that monotonously increases for every access. I call this counter. When I access an object, I store the current value of counter as the age, and then I increment counter. No two objects have the same age.

values maps object identifiers to tuples of object and counter value. ages maps counter value to object identifier. smallest keeps track of the smallest index of ages.

At any one time (invariant!), for a given object values[obj.id] == (obj, age) and ages[age] == obj.id.

get retrieves the tuple from values, removes the corresponding entry in ages, puts in a new tuple (with new age) back to values, updates ages and smallest. That's several dictionary accesses, at O(log n), which is slower than O(1), but fast enough.

put works in a similar way, except that it may need to remove the least recently used object. It finds that object's identifier via ages[smallest].

I hope that makes sense. Perhaps the full code will help:

class ObjectCache(object):

    """Cache objects in memory."""

    def __init__(self):
        self.counter = 0 # counter for age
        self.values = {} # indexed by object id, gives (counter, object)
        self.ages = {}   # indexed by counter, gives object id
        self.smallest = -1 # smallest remembered counter value
        # Compute a default max cache size by assuming a one megabyte
        # block size and a 64 byte object size.
        self.max = 1000 * 1000 / 64

    def get(self, objid):
        pair = self.values.get(objid)
        if pair is None:
            return None
        obj, counter = pair
        self.values[objid] = (obj, self.counter)
        del self.ages[counter]
        self.ages[self.counter] = obj.id
        self.counter += 1
        while self.smallest not in self.ages:
            self.smallest += 1
        return obj

    def put(self, obj):
        if obj.id in self.values:
            del self.ages[self.values[obj.id][1]]
            self.values[obj.id] = (obj, self.counter)
            self.ages[self.counter] = obj.id
            self.counter += 1
            while self.smallest not in self.ages:
                self.smallest += 1
        else:
            self.values[obj.id] = (obj, self.counter)
            self.ages[self.counter] = obj.id
            self.counter += 1
            while self.smallest not in self.ages:
                self.smallest += 1
            if len(self.values) > self.max:
                del self.values[self.ages[self.smallest]]
                del self.ages[self.smallest]
                while self.smallest not in self.ages:
                    self.smallest += 1

Now that I blog about this, I'm sure someone will point out a horrible mistake in the code, or some data structure or algorithm that will make the whole thing O(1). However, the above code dropped my test case exceution time from about 90 minutes to less than 5, so I'm already happy.