Tuning gunicorn and Django performance on Heroku with blitz.io

Before starting to promote your startup to the public you really need to make sure that your landing page can handle any kind of traffic that will come your way. Blitz.io is my tool of choice. Heroku integrates with blitz and sets you up for a free account right from the start. Very nice.

Here's how Flawless.QA's landing page performance looked like, serving a page directly from memcached:

Note that we're not tuning Django here, or database performance. Our Postgres database is running on a different EC2 instance behind pgbouncer and should not have any effect on the results. This guide will show you how to set up your own Postgres instance. I suspect that using Amazon's RDS will be even easier than setting up Postgres ourselves, but that's something to look into at a later stage.

Based on Heroku's recommendations, the setup above is using gevent so we can have asynchronous workers. This seems to be a point of controversy, as Heroku's docs mention gevent but don't actually show you how to install it and set it up in your Procfile. Here's how our initial setup looks like:

web: python manage.py run_gunicorn -b 0.0.0.0:\$PORT -w 4 -k gevent --max-requests=500 --preload
The easiest thing to tune seems to be the number of workers, so let's try 9 workers instead of 4:

Well, that went to shit real fast. Performance is worse than with 4 workers, and if there's many concurrent requests they all fail, not just a small portion of the requests, which is what happened when we had 4 workers.

Now let's see what happens if we turn gevent off:

Perfect! Response times fluctuate a bit around the 200 requests per second limit, but every single requests is served quickly and correctly.

So why does gevent not give better performance on Heroku? In my test case there was nothing infrastructure-related that should have posed a problem, as the response was fetched from memcached directly, and the database was on a different instance. I'm guessing it might be because the overhead of gevent is a bit too much for a Heroku dyno to handle, especially when you're using a large number of worker processes (in gunicorn). I also wonder if this result might be more in favor of gevent if we count web page requests that include loading the static files references on the page. In our case this is a non-issue though, as our static files are hosted directly on Amazon S3, so I can't see any reason to turn gevent on when you're running a similar setup on Heroku.

But don't believe anything you read on the internet. Benchmark it yourself for your particular use case.

Posted in Tech | Tagged , , , ,

Thoughts on post-modern web development

We're getting ready to (p)(re)launch Flawless.QA with some new features that I am quite excited about. That sounds a bit like a marketing phrase, but weirdly enough I am really genuinely excited. Brian and I worked hard to get the project to the state that it's in now, and it's about to pay off. Here's some short programming-related thoughts on the stack that we're using.

Django-south: I can't believe I've lived so long without using this. Well, actually, I can, because at Potato London we're mainly on the appengine platform, which doesn't play nicely with South. That's a terrible shame because django-south is one of the two great revelations for me during this project. Database migrations have never been easier.

Django-celery: this is the other great revelation for me. Celery makes it extremely easy to set up background tasks and periodic tasks. It was wonderfully easy to set up and works brilliantly. Django-south and django-celery are the two must-have modules of any serious Django project.

Amazon Web Services: scares the shit out of me still, but I can taste the potential. It tastes like mango. The amount of free things that Amazon gives you is amazing: a free machine with an operating system of your choice, free super-fast static file storage, 2000 free email sends per day, 30GB of storage, a 20GB database with another 20GB for automatic database backups. Seemingly the only catch is that the free tier has a bandwidth cap of up to 15GB, after which they start charging you money. We haven't reached that yet so I've really no idea how much it's going to cost us.

Heroku: I feel a bit sorry about Heroku. We signed up enthusiastically for their free services and it helped us got our site in the air just a tad bit faster than if we had to configure it from scratch on AWS, but we don't even have paying customers yet and I already feel constrained on Heroku. The biggest limitation is the database. The free Postgres database Heroku offers is useless with a 10.000 row limit, and the next step up is 'not recommended for production' and artificially crippled. Even if you did go for that, it would cost you $8 per month and you get a worse product than with the free Amazon Relational Database Service. So yes, the database really is the big blocker for Heroku.

Note that I'm still a big fan of Heroku. Their developer-oriented product is beautiful to behold and calming to interact with. Finding out how to do something on Heroku is an order of magnitude more easy than it is in Amazon Web Services. AWS adds a lot of clutter and worry to my already messed up brain, but Heroku takes it away. Too bad it also takes away a lot of your money.

Cloudflare: not sure if want. We've set this up as our dns server because the nameservers of the company we bought the domain from seem terrible, but Cloudflare introduces a lot of things that we're not sure of, in particular the page caching mechanism. It seems like it could end up being a danger, and given that dns-level changes take a long time to propagate, there's no easy solution if we mess something up. I'll need to read up a bit more on what they do before I can decide what to think of Cloudflare.

Postgres vs MySQL: One of the eternal debates. We went with Postgres because Heroku gives it to you by default, and Django is pretty much database-agnostic. It's my first experience with Postgres having come from a long background in MySQL. So far I have no problem with either, but I know a lot of MySQL tricks related to configuring it to act more like a proper RDBMS (something which Postgres does by default), but also related to scaling and backing it up. It seems that we won't have to worry about this at all if we go with Amazon's RDS though, which is essentially a managed MySQL database service. Migrating the data worries me..

Appengine: I mention this here because my previous project was developed on Google Appengine. I can't help but wonder how that project would have turned out if we did not use appengine. Sure, you would lose a lot of the automatic scalability that appengine does, but in practice, having that automated scalability meant that I had to spend a lot of time learning appengine's very-specific APIs that are totally useless elsewhere. If we didn't use appengine I would have had to spend that time on a much wider variety of topics, which would surely include database tuning, configuring webservers for static file hosting, managing virtual machines etc. And that knowledge would have carried over a lot more easily into a next job or a startup. That said, appengine did teach me a lot, especially on how to scale things, and that will definitely come in handy in the future.

Posted in Tech | Tagged

A two-man team

Not much time to blog, thought I'd leave this short thought here.

During my job at Potato I work with a team of anywhere between 6-10 people, depending on the day of the week and the position of the moon. This situation is comfortable for me because everyone has their own little corner of the code to work in, and no one is single-handedly responsible for a massive bit of code. Everyone can cover for each other if they have to, which happens often as people go on holiday or take breaks.

I've done personal projects all by myself, doing everything from setting up servers all the way down to writing the text that users will see and styling it up to look pretty. This is comfortable too because I have total control over everything. I choose what to spend time on, I choose what to build.

But a team of two people working closely together on Flawless.QA, that's new to me, and I'm still adjusting. Because we're only two people we both feel responsible for the entirety of our codebase, yet because the target we set for ourselves requires so much work it is pretty much impossible for one person to do everything himself. The frustrating part I am feeling though, is that because we're only two people, it almost seems like I could do it myself. Of course it only seems that way, and I know it's not true. It's the effect of two people working closely together and knowing largely what the other one is up to. Brian deserves full credit here as he put in a lot more hours than I did.

The product is starting to take shape. We'll have a new release at the end of this week. It'll be full of cool stuff, and many behind-the-scenes improvements that make our infrastructure a lot more sustainable. More on that later.

Posted in Tech | Tagged

Flawless.QA: the making of

It's been a while since I've updated. I've been meaning to write something ever since faux-launching (faunching?) Flawless.QA last week, but have been occupied with many other things. I thought I'd share here how we went from idea to landing page in about a week, and the troubles that accompanied it.

The stack we're using is essentially Python+Django on top of a gunicorn webserver, running on Heroku. We're using Unfuddle for project management and git repositories, and are currently using Cloudflare for our DNS server. For e-mail we're using Mailgun. Although we're only two people we've just started using Hipchat, which offers lovely integration with git + unfuddle. The beauty of this setup is that all of the component scale with money. Need better webserver performance? Just throw money at Heroku. All the services listed here offer you a free account to get started, and you can scale it up from there without spending too much time.

There are limitations, of course, and you'll hit them sooner than you think. Mailgun's free account only offers to send out 200 e-mails per day, which, if you're an e-mail-heavy business like us, will not last you long. There's alternatives, such as Google Apps, but that's not free. The same thing goes for Cloudflare: I've tested the performance (using blitz.io, a great product for performance testing) of Flawless.QA on the actual domain via Cloudflare's DNS, and via Heroku's subdomain at herokuapp.com, and the performance of Cloudflare is quite appalling compared to the direct access. I'm not quite sure why yet. Cloudflare offers tons of little micro-optimizations for your website, but I'm guessing they might prioritize their paying customers, resulting in worse performance for the free accounts.

The reason we had to change DNS servers and mail servers for our domain was that the provider we bought the domain from seemed to have trouble keeping the site up. Every once in a while we would try to access the domain via their nameservers and get a blank page or a 404 as a result. Unable to pinpoint the exact reason we decided to use the Cloudfare nameservers instead, which solved the issue. During the process we learned a lot about CNAME records and A records. You can go 90% of the way to launching your startup, but until you buy the actual domain name and set it up you'll never run in to these issues. A lesson learned.

We ran into e-mail issues as well, as the provider we bought the domain from would no longer allow us to set up e-mail forwarding after we switched nameservers. The old redirects still work, but it's definitely something we'll need to address in the near future.

Besides e-mail there's a lot of other shortcuts we took just to get a minimum viable product out there as soon as possible. From a technical point of view we've released an entirely unoptimized version of the site: no minifying and bundling of resources, no public caching, not even serving the static files from a fast location. Fortunately these are some of the things that Cloudflare adddresses, so we managed to get away with it so far.

As I am learning, it's one thing to get a minimum viable landing page out there, but getting a minimum viable paid subscription is a whole other thing. The problem we set out to solve, automating content quality checking for websites, is a fairly technical one, but fortunately for us a lot of it is behind-the-scenes code, which will not affects users if it breaks from time to time. If I had to do it all over again I would probably have started working on the paid accounts sooner. But like any startup, it's hard to know that before doing one, and we're learning as we go along.

Posted in Tech

Flawless.QA: automatic error finder for your website

Your website. Flawless.

Today is the day I get to reveal what we've been working on for the past two weeks. And here it is: Flawless.QA is a service that automatically checks pages on your website and notifies you via e-mail about mistakes whenever there's an update. The mistakes that we can currently detect are spelling and grammar errors, and broken links, but we're working on extending that.

At the moment we only do about one check per day, roughly. We're already making plans to improve upon this in the future, but we wanted to get our idea out to the public as soon as possible. Please give it a try, and let us know what you think. I promise I'll reply to you personally if you mention this blogpost ;)

 

Posted in Tech | Tagged

Sustainability

Finding an idea. Choosing from a gazillion seemingly good ideas. Finally settling on a reasonable idea, becoming very enthusiastic, then having all that enthusiasm blown away again by a little market research. Rediscovering the enthusiasm after realizing there's an area of the market where our product would fit. Getting started on the technical bits and realizing that the problem was way, way more difficult than you initially imagined. Thinking you don't need an office because you can work from the library, then discovering that there's wifi interference and a terrible 3G signal making it impossible. Bunking up with your startup-partner-in-crime for two weeks in a very tiny room with a worse-than-average internet connection. This is my life.

Or rather, this has been my life for the past two weeks. I'm still feeling uncomfortable while in the 'idea' phase of the whatever-it-is-that-we're-going-to-do, but I guess that's to be expected. A lot of startup articles talk about how talk is cheap, ideas are cheaper and what really matters is that you build stuff. I can't help but think it's the opposite. I've seen and read about dozens of startups, either successful or not, and I'm fascinated by how startups decide on their idea. Paul Graham talks a lot about how Y-combinator invests in people, not ideas, and I can totally see that. In our case though, we're doing this on our own, and we're in charge of our idea.

It's a tricky business, deciding on an idea, but that did not surprise me. What did surprise me is how much time we're spending on homing on an implementation of an idea. Icky questions such as what audience do you target come to mind. Targeting a large audience sounds existing from a development point of view as you get to make something big and generic, which is usually fun to program. But from a business point of view it means competing with tons of existing companies, meaning you'll have to be a lot more competitive. On the other end of the spectrum you could have a product that's targeted towards a very small audience (eg. one-legged midgets from Lithuania) but that won't get you a lot of customers. Balance is the key, as always.

One thing I don't like about the online startup cult is how it always tells you to 'fail fast'. Some people seem to take this as an absolute rule, but I think there's exceptions. Sure, if you've got five ideas you're all reasonably (but not massively) excited about, it makes sense to get your minimum viable product out there for the first one, give up as soon as you get the first indication that it wouldn't work and then move on to the next idea. Do this a couple of times and you'll be called a 'serial entrepreneur' as if it was a title of importance. I'd much rather have an idea I believe in so strongly that I would try variation upon variation of it until I succeeded. I do realize that not all ideas lend themselves to this though. It will be interesting to find out what kind of idea ours will turn out to be. (I'll save the reveal of the actual idea for a post that is a bit less meta about startups).

We've arranged for office space. The skeleton of our app is set up, but completely empty. We have a plan to monetize our idea and test our assumptions of why it's a good idea. We're all set. The fun begins tomorrow!

Posted in Tech | Tagged

AngelHack thoughts on SchoolSeer

Last weekend was AngelHack London. A friend talked me into signing up, one thing led to another and before we knew it we were five people dedicated to improving communication between parents and teachers. 24 hours of hacking later and we came up with SchoolSeer, a site that allows teachers and parents to send messages, view grades and attendance records of their children. You cna try out the demo version here: http://www.colorfulwolf.com/seer. The login is guest/guest.

We started out five fresh people at the Bloomberg building in Angel. One of us had something urgent come up, leaving us with four people: three back-enders, one front-ender, no designer/ux guy. We were busy discussing our database model when lunch was announced, and 10 minutes later all the food was gone. The organizers mentioned there'd be more food in an hour but we didn't wait for that, which turned out to be the right decision as that was soon gone too.

Since we arrived quite late all the good spots were taken, and we had to settle on a table to the side of the main presentation area, next to a giant speaker. It could've been worse though, as the main hall with all the developers was even louder. We were just starting to make progress when the internet connection died, and it kept dying on us every couple of minutes. The organizers promised improvement, but we decided to ditch the noise and head over to Potato HQ, where we did all the hard work.

As it grew later and later our dev speed slowed, and the early hours of the morning were the worst. Right after that we started preparing the presentation and generating dummy data that would impress the jury, but that took a lot longer than expected. Or perhaps we underestimated the fatigue. In any case, when we were ready to head back to the AngelHack building there were only two of us left, tired and worn out, but psyched up to do our presentation.

There were technical difficulties yet again. My friend's old MacBook did not have the right connector to connect to the projector, and even the adapter dongle thingie he brought did not work. My laptop had the right connector but could somehow not show an image on the projector. The guy who helped me try and set it up blamed it on my laptop rather than admitting they should've sorted out a proper projector. Instead, we could only display a blurry image by pointing a videocamera to my laptop's screen. Ugh.  The projector had an 800x600 resolution so it might not have been a big loss.

Despite the blurry image, Brian managed to get the point across pretty well as I drove the laptop. They let us talk on for a bit longer than was the rule, which we took as a sign that the jury was interested. We left the jury a note apologizing for the poor presentation, suggesting we show them in person instead, but haven't heard back from them. Shortly after our presentation I headed home as I was dead tired. Brian stayed behind a while to try and get in touch with the jury, but apparently they disappeared soon after the result was announced, delayed by hours.

The whole experience was definitely memorable, and I was astonished at what our team managed to produce in such a short time. But to be perfectly honest, I don't quite get the point of the hackathon. It's all about nerf guns, shitty music, a crappy internet connection and coding in a sub-optimal state. I'm glad to have had the choice to relocate to a nearby office to get some actual shit done. I think our end product reflects this mindset. Although I do think we've got to spend more time on presentation in the future.

Our efforts will not go to waste. We've learned a lot of things from the hackathon, and we'll strive to get more (well, any) feedback on our project, and hopefully propose it to schools and investors. The best is yet to come :)

Posted in Tech | Tagged ,

Perceived Programming Speed

There's two ways of being a fast programmer: you're either a really fast coder, or you have the talent and experience to not make mistakes. I know of one developer who is absolutely unbeatable in terms of sheer speed, but he does not excel in testing, so sometimes bugs seep through, which makes his bottom-line speed lower. Another programmer has a decent to fast developing speed, but I have never caught him making even a single mistake, so his perceived speed is incredibly high.

As for myself, I'm probably somewhere in-between. My actual developing speed is not very fast, but I tend to error-check myself a lot so I don't release a lot of errors on others. I'd also like to think that I'm programming for maintenance, which is good if you're maintaining a codebase in the long term, as you'll be able to do future feature requests way faster when your foundation is solid.

My final days at Potato are approaching. I'm leaving behind a project that I've worked on for over a year, and has pretty much been redone from the ground up, section by section. My final tasks here are to make the foundations even more solid so that the people I leave behind will have an easy task building on that foundation. It involves tearing down some things, but the end result will be more solid. And that's how it should be. Robustness does not necessarily exclude speed, but quite often speed excludes robustness.

Posted in Tech

AngelHack here we come!

It's been a long night. I've had less than an hour of sleep. Together the four-and-a-half of us managed to build a quite amazing project. Doing the final touches now, and then preparing for the presentation. Awesome!

Posted in Tech

Changing things

I'm too tired to write this in a nice way. Last Sunday I came back from a great trip to Dorset. The next day, I quit my job and announced to my boss that I'm planning to do a startup with my friend Brian. We don't have an idea yet. It must be premature to quit your job when you don't know what you're going to be doing, but it seems like the right thing to do to me.

This weekend is Angelhack. Some Potato friends, Brian and I have registered to participate. We're fairly sure of what to build, but still pivoting at the last moment. A lot of things can happen. I'm going to be thinking about startups every evening after work, since Brian is staying at my place. Work is a ton of coding as well, and Angelhack this weekend will be a 24 hour coding marathon. I feel a brain explosion coming up.

So many ideas. So much more to come.

Posted in Daily Life , Tech , Thoughts