Developing PythonAnywhere on PythonAnywhere

Yo dawg, I heard you like dogfood…

https://en.wikipedia.org/wiki/Eating_your_own_dog_food

As we’re developing an online IDE, it made a lot of sense to us that we should “dogfood” as much as possible, so we’re actually using PythonAnywhere to develop PythonAnywhere. When it’s come up in conversation, people are occasionally surprised or impressed, and they seem to think we should make a bigger deal out of it, so here you go!

It’s an evolving process, but here’s an outline of how we work now, and some of our hopes for the future.

Servers, servers, servers…

illustration of dogfood, dev and local environments

At the heart of it is a cluster of “dogfood” servers that are essentially a clone of our production environment, with a few minor differences, and we use the ‘dogfood’ version of PythonAnywhere to do almost all our development: we can use bash consoles to SSH into our dev servers, and reboot apache for example. We use the browser-based editor, or maybe vim from another bash console, to make changes to the source code, and that ends up on the servers because they have it mounted via NFS – that’s the only bit you can’t currently do on the live production PythonAnywhere.

At the moment we still tend to run our functional (selenium) tests from a local PC, but even that can start to change now that we’ve released headless selenium testing support. So we’re really very close to doing all our development for PythonAnywhere, entirely on PythonAnywhere.

A day in the life…

Here’s how a typical day might go.

I can start by logging into my user account on pythonanywhere-dogfood – they’re normal user accounts, no admin access or anything special. I open up a bash console, and checkout the main pythonanywhere repo from github.

The repo contains everything I need to build a dev server from scratch. We recently switched from running our dev servers on VMs on a box under one of our desk, to running them as amazon EC2 micro instances. So, in the same bash console, I can run a single command, ./create-and-start-servers.py dev all, and it will spin up a PythonAnywhere dev cluster on amazon. We use fabric to automate the building and config of each server, and we save disk images with all the libraries we need on them.

Here’s one bit where we cheat: Our dogfood server allows sharing over NFS, so each of my dev servers actually has my source tree on the dogfood server mounted, to use as its own codebase.

Once my server is up and running, I can use a separate checkout of the source tree on my local PC to write functional tests, and run those FTs against my dev server. Since we do pair programming, by then I’ve recruited a colleague to help me out (we have a rota, so no-one is forced to work with me every single day. there is the Geneva convention to think about…).

Once we’ve written an FT and seen it run from our local PC against the dev servers, we can switch back to dogfood to start writing our unit tests. We’ve got several vim addicts in the office, so that will probably happen in a console session, or we might use the browser-based editor.

To run the unit tests, we’ll have an SSH session logged into the dev server, which can run a ./manage.py test, for example. Then we can get on with writing the actual code. The NFS share is a convenience here, it means that our edits are immediately reflected on the server, so we can see the effect of code changes straight away – but we could use…

When we’re happy all our tests pass, we can commit them, and push them back up to github, where they’ll be picked up in the next run of our integration loop. And then it’s off to the pub!

In summary:

On PythonAnwhere On local PCs (or cheating)
All source code editing (either in the editor or console vim) Running functional tests
DVCS – git commits, pull/push to github pushing changes to dev environments
VM control: spin up EC2 instances, reboot apache etc project management (we use trac)
unit testing
deploying new versions to live

What works well, what we want to make work better:

We started to move to more and more dogfooding over the last two months, and it’s interesting to look at what’s been good and bad about that.

On the plus side, we get all the benefits of having a cloud-based dev environment. IRL, that translates into things like making it easier to work from home – I was off sick the other day, and being able to access the full dev environment from my home PC made it possible for me to get some real work done… with no need for any complicated VPN antics. I could also pick up from exactly where I left off, with my vim and bash shells just where I’d left them the previous day.

It also really helps for weekend support. We like to make sure we pair up for any kind of serious support task, so having a shared console on pythonanywhere, alongside a skype call, means it’s easy for two of us to dive into the servers, test out fixes and so on, even when we’re not in the room together.

In the office, having access to each others consoles means we don’t have to keep hopping between desks too. It’s only a minor thing, but it helps… And, before you ask, we do know about screen, but screen is far from hassle-free. Again, just yesterday, Giles was unable to log into a screen session I’d left running for him on one of our legacy servers because of a weird permissions issue – pythonanywhere doesn’t give us those sorts of annoyances.

Which is not to say PythonAnywhere doesn’t give us other annoyances!

The hope, with dogfooding, is that if you’re forced to use the same tool as your customers, you’ll get round to fixing irritations more quickly! So here’s what’s been frustrating us and is therefore getting pushed up our to-do list:

  • better copy & pasting inside consoles. Although the Ctrl+Shift+V solution is a reasonable workaround in chrome, it really needs to be a lot better, work in all browsers, and support multi-line pastes
  • improve console responsiveness. We’re forcing ourselves to use servers in the US, so there’s a fair bit of latency, and that means typing into consoles is not as pleasant as it should be. We’re looking into spreading more servers into more different places around the world, but also into whether some kind of local keystroke echo might help
  • improve the browser-based editor. It’s got the basics, but it needs more, things like multi-file find & replace, and some kind of scripting ability

Aside from fixing those bugs, we want to start running selenium tests directly from PythonAnywere, if we can. The trouble with this is that it’s harder to debug when you can’t put a “time.sleep” and see what’s happening in the real browser window… but a judicious use of screenshots could potentially work just as well… We also want to stop “cheating” by using an NFS mount, and develop PythonAnywhere on the production PythonAnywhere servers – maybe some judicious use of github callbacks, or even Dropbox…

But, in the meantime, things are working pretty well… Of course, the danger with dogfooding is that you get so used to your tool, and especially you get used to workarounds for little bugs, that you forget what it’s like for newer users. So, beloved users, if you’re using PythonAnywhere for serious development, what’s your experience? What do you think we need to change?

Running headless Selenium browser tests on PythonAnywhere with Xvfb

You can now run Selenium tests from PythonAnywhere, using a virtual display.

PythonAnywhere, for those that don’t know, is our attempt at a fully browser-based Python development environment. So that includes an editor, consoles to actually run your scripts, and a web hosting platform too… But until now, it hasn’t been perfect for the Best Form of Development(TM), namely test-driven development, TDD.

Being an XP shop, we’re very keen on TDD and functional testing, for which we use Selenium to drive a real web browser, and check the actual behaviour of our site. Until now we’ve not been able to “dogfood” our selenium tests and run them from PythonAnywhere itself, because opening up a real web browser tends to mean needing a real display, something our servers don’t have.

But a solution exists! Even “headless” servers can use a virtual display like Xvfb to spoof apps like Firefox into running, even if there’s no real screen for them to actually be displayed on. Inspired by this post, we’ve now added the binaries for Xvfb and Firefox (well, iceweasel actually) to our servers, as well as the excellent pyVirtualDisplay module, so running selenium tests on PythonAnywhere is now as easy as this:

from pyvirtualdisplay import Display
from selenium import webdriver
 
display = Display(visible=0, size=(800, 600))
display.start()
 
# we can now start Firefox and it will run inside the virtual display
browser = webdriver.Firefox()
browser.get('http://www.google.com')
print browser.title #this should print "Google"
 
#tidy-up
browser.quit()
display.stop() # ignore any output from this.

(Code shamelessly stolen from Corey Goldberg)

Give it a go! And if you need some inspiration for what kind of tests to run, I can recommend an excellent TDD/Django/Selenium tutorial ;-)

PythonAnywhereAnywhere

We recently added something cool to PythonAnywhere — if you’re writing a tutorial, or anything else where you’d find a Python console useful in a web page, you can use one of ours! Check it out:

What’s particularly cool about these consoles (apart from the fact that they advertise the world’s best Python IDE-in-a-browser) is that they keep the session data on a per-client basis — so, if you put one on multiple pages of your tutorial, the user’s previous state is kept as they navigate from page to page! The downside (or is it an upside?) is that this state is also kept from site to site, so if they go from your page to someone else’s, they’ll have the state they had when they were trying out yours.

Bug or feature? Let me know what you think in the comments…

[Cross-posted from gilesthomas.com]

Secure socket.io – websockets et al over SSL/TLS with tornadio2

I was surprised at how easy it was to enable SSL on socket.io with Tornado and Tornadio2 – read on!

We’ve been implementing some transport-layer security on PythonAnywhere, with HTTPS for our and our users’ web pages (due to go live in the next few days), but as some of you may know, normal HTTP(S) connections are only a part of PythonAnywhere.

PythonAnywhere Console session showing HTTPS websockets

For our in-browser console sessions, we use socket.io to carry user keystrokes and console output to and from our servers. And, until now, that traffic was all unencrypted. Well, no longer!

We have to give huge, huge props to MrJoes and the Tornadio team for their new project, tornadio2, which brings in compatibility with the latest improvements in socket.io and the websockets protocol. And massive thanks to the socket.io and Tornado teams too.

I decided to start by seeing whether I could adapt one of the standard tornadio examples to use SSL. This turned out to be an extraordinarily simple, 2-step process:
1 – amend socket.io on the client side to use https
2 – amend the tornado server side to use SSL.

Here’s the code:

Client-side – before:

conn = io.connect('http://' + window.location.host + '/', {

After:

conn = io.connect('https://' + window.location.host + '/', {

Server-side – before:

tornadio2.server.SocketServer(application)

After:

tornadio2.server.SocketServer(application, ssl_options={
    "certfile": "server.crt",
    "keyfile":  "server.key",
})

That’s really not too bad is it?

All websockets connections are now secure, using the wss:// protocol. Here’s the Chrome Dev toolbar proving it!

PythonAnywhere Console with Chrome dev window showing WSS connection

XHR-polling, and JSON-polling sessions also use HTTPS.

I followed this guide to create self-signed ssl certificates for testing. And, you can see my example live on GitHub, where MrJoes has accepted my pull request! https://github.com/MrJoes/tornadio2/tree/master/examples/ssl_transports

Yo Dawg, I heard you like decorators…

Know your meme…

Everyone loves decorators, right? Especially useful for dealing with authorisation for Django views,eg

def authorised_users(view):
    def decorated_view(request):
         if not request.user.has_some_attribute():
             return HttpResponseForbidden('no, bad user!')
         return view(request)
    return decorated_view
 
 
@authorised_users
def view_special_stuff(request):
    return stuff

or, we have one that will repeat a function a certain number of times:

@repeat_for(time=60, wait=5)
def try_something():
    return result

All well and good – they’re a good way of separating out some housekeeping code and actual application logic. But the question is, how to test them? Let’s say that, before we used a decorator, the tests for our view looked like this:

test_view_does_what_its_supposed_to_in_case_x(self):
test_view_does_what_its_supposed_to_in_case_y(self):
test_view_allows_authorised_user_type_1(self):
test_view_allows_authorised_user_type_1(self):
test_view_refuses_unauthorised_users(self):
test_view_refuses_anonymous_users(self):

well, the last 4 of those tests are really tests of the decorator, so they will be copied over into a separate test class that tests the decorator… But how can we then tell whether our view has been decorated? It would be annoying to have to duplicate the tests between the decorator and the view, and any other views that use it. What we really need is a way to tell if a view has been decorated… We need the decorator to “decorate” the function with a little tag, that marks the function as having been decorated:

def authorised_users(view):
    def decorated_view(request):
         if not request.user.has_some_attribute():
             return HttpResponseForbidden('no, bad user!')
         return view(request)
    decorated_view._decorated_with = 'authorised_users'
    return decorated_view

Now our tests for the decorator can just stay with the decorator, and all we need to do in our view tests is:

test_view_does_what_its_supposed_to_in_case_x(self):
test_view_does_what_its_supposed_to_in_case_y(self):
test_view_decorated_with_authorised_users(self):
    self.assertEquals(view._decorated_with, 'authorised_users')

And while you’re at it, you might decide to make _decorated_with into a set, so that you can have multiple decorators on a view… All fairly natural stuff so far…

But I think you can tell where this is going. It’s tedious to have to include that little “decorated_with” code in every single decorator you write… Would’nt it be cool if we could somehow abstract that out… and, naturally, the best way to do that would be with….

PRESENTING: @baroque – a decorating decorator decorator

def baroque(decorator):
    def tagging_decorator(func):
        decorated_by = set()
        if hasattr(func, "decorated_by"):
            decorated_by = func.decorated_by.copy()
        decorated_by.add(decorator.__name__)
 
        func = decorator(func)
 
        func.decorated_by = decorated_by
 
        return func
 
    tagging_decorator.__name__ = decorator.__name__
    return tagging_decorator

Perhaps the best way to understand this decorator-decorator is through its tests, which I present to you, verbatim:

from functools import wraps
import unittest
 
from decorators import baroque
 
 
class TestBaroque(unittest.TestCase):
 
    def test_decorated_decorator_still_works(self):
        @baroque
        def decorator(func):
            def inner(*args):
                return func(*args) * 2
            return inner
 
        @decorator
        def foo(x):
            return x + 1
 
        self.assertEquals(foo(123), 248)
 
 
    def test_baroque_doesnt_trash_decorator_name(self):
        @baroque
        def decoratey(func):
            return func
 
        self.assertEquals(decoratey.__name__, "decoratey")
 
 
    def test_baroque_decorates_decorated_function_with_names_of_decorators(self):
 
        @baroque
        def simple_decorator(func):
            return func
 
        @baroque
        def decorator_with_inner(func):
            @wraps
            def inner():
                return func()
            return inner
 
        def decorator_maker_with_args(args):
            @baroque
            def inner_decorator_from_maker(func):
                return func
            return inner_decorator_from_maker
 
        @baroque
        class class_based_decorator(object):
            def __init__(self, func):
                self.func = func
 
            def __call__(self):
                return self.func()
 
 
        @simple_decorator
        def foo():
            pass
        self.assertEquals(foo.decorated_by, set(["simple_decorator"]))
 
        @simple_decorator
        @decorator_with_inner
        def foo():
            pass
        self.assertEquals(
            foo.decorated_by, 
            set(["simple_decorator", "decorator_with_inner"])
        )
 
        @simple_decorator
        @decorator_with_inner
        @decorator_maker_with_args(None)
        def foo():
            pass
        self.assertEquals(
            foo.decorated_by,
            set([
                "simple_decorator", 
                "decorator_with_inner", 
                "inner_decorator_from_maker"
            ])
        )
 
        @simple_decorator
        @decorator_with_inner
        @decorator_maker_with_args(None)
        @class_based_decorator
        def foo():
            pass
        self.assertEquals(
            foo.decorated_by, 
            set([
                "simple_decorator", 
                "decorator_with_inner", 
                "inner_decorator_from_maker", 
                "class_based_decorator"
            ])
        )
 
        @class_based_decorator
        @decorator_maker_with_args(None)
        @decorator_with_inner
        @simple_decorator
        def backwards_decorated_foo():
            pass
        self.assertEquals(
            backwards_decorated_foo.decorated_by,
            set([
                "simple_decorator", 
                "decorator_with_inner", 
                "inner_decorator_from_maker", 
                "class_based_decorator"
            ])
        )

Now we’re not completely happy with @baroque – the way it works on decorator-makers (or decorators with args, if you prefer) isn’t perfect… And it doesn’t work too well with nested class-based decorators either. So improvements and suggestions are gratefully received!

PythonAnywhere update, 18 August 2011

We’ve been working hard on PythonAnywhere over the last month or so, and have added a bunch of stuff. Our objective isn’t just to make a cool website with cool technology, that makes people go, oh, that’s cool. We want to change our users’ lives :-) What can we do to get you really excited?

Here are the features we’ve added since 15 July:

  • Scheduled tasks. If you want a Python script — or even a shell script — to run daily or hourly, you can now set it up from PythonAnywhere. The scripts run on our servers at Amazon’s EC2 datacenter, so you’ve got loads of bandwidth and CPU power to play with. Screen-scape television listings and store them in your Dropbox… download share price data for backtesting your trading strategies… or whatever you like!
  • We’ve added Python 2.7, and upgraded our Python 3 version to 3.2.1.
  • New Python modules: mcrypt, mhash, pymc, pysal, traits, and networkx.
  • Subversion support. Because not everyone uses git or hg
  • The editor now has a “Save & Run” button to launch a Python console running your code. There are also various other tweaks to make the editor more usable. We’re using it ourselves now while we develop PythonAnywhere, which should encourage us to make it as smooth as possible!
  • We’ve started putting together some documentation, in the form of a FAQ.
  • A bunch of useful new commands are available from bash consoles, including an updated vim with Python syntax highlighting, make, wc, awk, scp, and rmdir.
  • …and various minor enhancements to the general experience, including password set/reset and the option to log in using your email address instead of your PythonAnywhere user ID.

Coming soon, we hope to have improved support for IPython cluster computing.

What else do you think we should be working on?

PythonAnywhere

For readers that haven’t seen it yet, PythonAnywhere is our new product, currently (like Dirigible) in a very early-access beta. It’s a cloud-based Python development environment, with in-browser consoles and editors. You can read more about it on the product’s page, and sign up for the beta there — when you sign up, we’ll send you an email; reply saying that you heard about PythonAnywhere on the devblog and we’ll put you at the front of the queue for joining the beta.

The product page says a certain amount about what PythonAnywhere is, but it’s doesn’t explain why we’re writing it — so we’ll do that here.

Basically, PythonAnywhere has grown out of Dirigible. When we asked Dirigible’s beta testers what they were using it for, a suprising number said that it was for general Python development online. They weren’t using the spreadsheet grid at all! So we took a look at our codebase, and realised that we could re-use it and produce something specialised for that specific task. A quick search made it clear that it would also be easy to provide a browser-based console — so we decide to add that too.

However, PythonAnywhere does differ a little from Dirigible. For example, when you’re writing Python code, you generally need a filesystem to load data from and store your calculations’ results — this could also be useful for a spreadsheet, but it’s not essential. Similarly, getting data into and out of a Python development environment is different to importing it into/exporting it from a spreadsheet — so we added Dropbox integration to PythonAnywhere.

That said, while some features are more important for a spreadsheet and some more important for an online IDE, most of the stuff we’ve added to PythonAnywhere looks like it would work well in Dirigible — so perhaps in the future we’ll back-port them, or try rebuilding Dirigible on top of PythonAnywhere. Time will tell.

We’re back!

Eagle-eyed readers will note that we’ve changed the title of this blog (and also the URL, though that should be transparent to you).

Instead of focusing entirely on the technical adventures we’re having as we write Dirigible, our Python cloud spreadsheet, we’ve decided to broaden our outlook and cover what we’re doing with its sister products, Resolver One (the desktop application that inspired Dirigible), and PythonAnywhere, our new product using the Dirigible core to make Python cloud programming easy.

Keep tuned, there’s a lot of interesting stuff coming.

Upgrading to squeeze, Python 2.6, and mount and sudo oddities

We’re upgrading the Dirigible programmable cloud spreadsheet to use the latest version of Debian GNU/Linux — one of the big drivers for this is that we want to support Python 2.6 rather than the (slightly antiquated) 2.5 that we’ve supported to date. Because we practise Extreme Programming and have lots of automated functional and unit tests for our codebase, this is much easier than it would otherwise be — we’ve just needed to do the upgrade and let the tests run — all of the upgrade issues have become immediately apparent, and once we’ve sorted them out we can release the new version and be confident that it works.

Most of the issues we’ve encountered have been simple to solve — in general, it’s just been a case of making sure that the right stuff is loaded into the chroot jail we use to ensure that spreadsheets run in isolation and that everyone’s data is secure. However, there were two oddities that are worth blogging about — hopefully giving the details here will help other people doing similar upgrades.

sudo oddities

The first problem we encountered was with sudo, the Linux command that lets a process take on superuser permissions for a single task. We use sudo quite heavily, particularly when constructing the chroot jail. We discovered that when we called it from within the Dirigible back-end, it would hang; running it from the command line was fine. When run from within the server, the action that we’d used sudo to perform would complete OK, but then the action’s process would get stuck in a “zombie” state (marked as <defunct> in the output from ps) and the sudo process would never exit — it just hung. It took several hours to track down why this was, and it turns out to be a bug in sudo version 1.7.4. Ivan Zahariev discovered the problem and blogged about it here. The underlying issue appears to be a race condition in the code that handles the action process’s shutdown, and it was fixed in sudo version 1.7.5. Sadly, because Debian squeeze is new, there’s no upgraded version of sudo in its backports repository right now. However, the sudo maintainer kindly provides packaged versions of the tool for most common versions of Unix, so we were able to change our machine-build script to download the appropriate one and install it by adding the following lines to our fabfile:

    run('wget http://www.sudo.ws/sudo/dist/packages/Debian/6/sudo_1.7.5-1_amd64.deb')
    run('dpkg -i sudo_1.7.5-1_amd64.deb')

That fixed the problem.

mount –bind and readonly

The other problem we encountered was with the mount command. As we mentioned in our post about chroot jails, we use mount --bind to make certain directories available inside the jail. Here’s an example of the kind of command we used before this upgrade:

mount -r --bind /usr/lib /tmp/chroot-jail-dir-20232/usr/lib

The --bind is how you tell mount to mount an existing directory somewhere else on the file system to a different directory, and the -r makes it read-only.

…except that it doesn’t. Under Linux, the -r option is overridden by --bind, and in the older Debian squeeze, this is done silently — so you never know that it happened. Our tests never picked this up because processes inside our chroot jail run as user nobody, so they can’t write to any of the directories in the jail anyway — which meant that there was no security risk, but also meant that when our tests checked that they couldn’t write inside the chroot jail, they couldn’t and so we assumed that the -r flag had worked. However, if we’d added a directory that was writable by nobody to the jail, then executing user code would have been able to write to it (shades of Abbott and Costello here — “nobody can write to the directory” — “but I can write to the directory” — “yes, nobody can write to it”). Anyway, this would have been bad — writable filesystems for spreadsheets is something we intend to add, but we want to add them explicitly under our own control, not accidentally because a command doesn’t do what we ask it to!

One nice feature in the version of mount in squeeze is that it prints out a warning to let you know that your -r option has been ignored, saying warning: directory seems to be mounted read-write, so as soon as we ran our unit tests, we discovered the problem. The workaround (taken from this Server Fault answer) was simple — we have to mount the directory once using --bind, and then re-mount it with a readonly flag:

mount --bind /usr/lib /tmp/chroot-jail-dir-20232/usr/lib
mount -o remount,ro /tmp/chroot-jail-dir-20232/usr/lib

Making it a two-step process like this is fine for us; apparently in some applications it’s problematic because another process might access the directory in between the two mounts, but that’s not an issue for us because untrusted code doesn’t get run until later on.

Anyway, those were the two interesting problems we’ve encountered while moving Dirigible from lenny to squeeze. It took just over a day to solve both, and the new version of Dirigible is now being thoroughly tested in our build farm; we hope to release it later on this week.

Recalc button! Conway’s Game of Life!

the new recalculate iconTo celebrate the minor, but oh-so-useful new feature that is the recalc button (also available via the F9 key), check out my new public sheet, Conway’s Game of Life:

http://www.projectdirigible.com/user/blogexamples/sheet/1124/

(NB, you can’t step through the Life cycles until you take a copy of the sheet)

A glider in the grid!

Apalled at how ugly my code is? Take a copy, write your own and share it!

[update 29/03 15:47PM - changed URL]

WordPress Themes