Concurrency in Django: caching, force-refreshing objects from the database and MySQL transactions

UPDATE 17/11/10 – see our follow-up post on this topic

Here’s a little story about how difficult we found it to implement a ‘last modified’ check on a django object, to deal with the possibility of the data in the database being out of sync with django’s in-memory view of an object.  Essentially, getting django to ‘refresh’ its querysets and its view of objects isn’t entirely straightforward, for reasons that aren’t very well documented*

Well, to be fair, they are documented, but not exactly where you’d expect. Read on!

First, the setup. In dirigible, we have a view that allows the user to request that their spreadsheet be recalculated, and the results sent back in JSON format.  Here’s a simplified version of the code.

def recalculate_sheet_and_return_json(request, sheet_id):
    recalc_start_time = datetime.now()
    sheet = Sheet.objects.get(pk=sheet_id)
    sheet.recalculate()  #recalculation may take a while
 
    # check if another request has come in and changed last_modified
    if sheet.last_modified > recalc_start_time:  #hmm, doubt this will work
        return HttpResponse('recalc aborted')
    else:
        sheet.save()
        return jsonify(sheet)

Recalculation can take a little while, during which time the user may take some actions that would interrupt the recalc, or make it invalid – for example, if they change the contents of a cell, then the recalc is based on out-of-date information, and may as well be discarded (a new recalc on the updated grid is called for automatically on the client side). We’ve set it up so that operations like this will update the last_modified field and save the object to the database.

At first, we thought django queryset caching was the problem, but that wasn’t it…

Just looking at the ‘naive’ code above, we immediately suspected that the sheet.last_modifed call wouldn’t work – it would correspond to whatever the last_modified time was when we first loaded the object, before the recalculation.  This seems to agree with the django docs, which tell us that a queryset is only evaluated once, when it’s first needed, and its results are subsequently cached

Making a new queryset would seem to be the solution:

[...]
# check if another request has come in and changed last_modified
sheet_in_db = Sheet.objects.get(pk=sheet_id)
if sheet_in_db.last_modified > recalc_start_time:
    return HttpResponse('recalc aborted')
[...]

We were frustrated when we found this wouldn’t work!  A quick search on the django-users mailing list showed we weren’t the only ones – you see posts about 3 or 4 times a year entitled “aaarg how can I get django to force refresh an object from the database???”, with answers of varying quality. We eventually found the answer we were looking for in this discussion which we’re happy to exhume for you.

In fact, it was about the way django and MySQL wraps views in isolated transactions…

in short:  by default, django wraps all view code in a transaction.  And by default, MySQL enforces consistency on transactions by isolating them from all other transactions that might be concurrent with it – in other words, one transaction can’t see what another one might have changed under its feet.  This is called ‘repeatable read’ transaction isolation. More info in this nice readable guide, and here in the MySQL docs.

The upshot of which is that, by default, django views can’t see changes made concurrently by other threads, users or processes.

How to fix this?  django allows more fine-grained control over transactions using several decorators.  we used @transaction.commit_manually

This allowed us to force the view’s open transaction to complete before we re-read the database, thus opening a new transaction which has access to any updates committed in the meantime:

@transaction.commit_manually
def recalculate_sheet_and_return_json(request, sheet_id):
    #begin_transaction is implied
    recalc_start_time = datetime.now()
    sheet = Sheet.objects.get(pk=sheet_id)
    sheet.recalculate()
 
    transaction.commit() #end current transaction, allows next queryset to view fresh version of db
    sheet_in_db = Sheet.objects.get(pk=sheet_id)
    if sheet_in_db.last_modified > recalc_start_time:
        return HttpResponse('recalc aborted')
    else:
        sheet.save()
        transaction.commit() #needed to commit the save above
        return jsonify(sheet)

This seems to work the way we want it to.  The alternative would be to change the way InnoDB manages transactions, but that seems a little too general, and we don’t want to stray too far from the vanilla config of Amazon RDS…

Simples!

2 Comments

  • By Semion, 21 March 2011 @ 4:15 pm

    You just saved me hours. <3

  • By Gonzalo, 27 May 2011 @ 4:05 am

    Thanks!!!! I had the same problem!

Other Links to this Post

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes