Fun with Square Codes

This is a qr code:

It's used to embed data inside of it, which can then be scanned by a large variety of devices. It's widely used in Japan to advertise links to web pages, which are otherwise difficult to type on a small device like a mobile phone. But they can be used for all kinds of things. All in all it's a very cool technology, so I wanted to experiment with it.

Python and Django are my server of choice, so I started looking for a library to generate square codes in Python. I soon wandered upon pyqrnative, which is a port from a javascript version, apparently. It's very compact and easy to use. Generating images was a piece of cake. I had Django serve dynamically generated images in no-time.

But how to read these images client-side? Since I recently dabbled in OpenCV and JavaCV I learned of an easy way to access the webcam in Java, so I made a Java implementation. This step was not very difficult, so I decided to make things harder on myself by embedding the webcam part in a Java applet, so it could run from a website.

I had a bit of trouble here, as accessing native libraries (using Java Native Access) proved quite difficult. I had a lot of trouble figuring out where JavaCV was loading its dll files from, and it took me a good half hour to debug it. A list of paths to OpenCV are hardcoded in the JavaCV source code. I wanted to include the dll files dynamically, which seemed near impossible. What's worse, removing or adding paths did not seem to make any difference as to whether JNA could find the dlls or not. I finally tracked it down to one folder: /usr/local/lib was the folder where my libraries were being loaded from. Which is damn peculiar, because that's a Linux folder and I'm running Windows 7. I suspected cygwin at first, but the problem was not that: When specifying the /usr/local/lib folder, jna looks for all folders in the system variable called jna.library.path, which contains the system path by default. Very obscure...

Having figured that out I could continue building the applet. I had jump through a bunch of hoops to get it to work, namely signing the applet (which is a very good thing considering I'm running native code that accesses a webcam!) and figuring the right tags to put in the HTML. HTML tags for applets are a big bloody mess, there's no standard at all. There's a deprecated applet tag, an object tag for IE and an embed tag for Mozilla (which also works in Chrome). I chose the embed tag for now.

There's a nice method in Java's AppletContext class which lets you change the browser's document's URL. It also allows you to execute arbitrary Javascript, so you can call a javascript function that does an ajax call back to the server that can update the web page! It's all very roundabout and takes three languages and a whole bunch of libraries to get going, but it's quite cool once it works. Here's a video of the result:

Posted in Tech | Tagged , , , , ,

jQuery.mobile and Django: nice!

I'm working on a little project in Django, and in the process of learning about writing web pages for mobile devices I came across jQuery mobile. It's a framework that runs on top of jQuery and provides an extremely easy way of creating web pages that look good on mobile devices. I really like working with it. It does have some quirks, though.

Today I got stuck on page navigation. I was trying to submit a form through a post request, yet jqm seems to cache pages permanently once they're loaded, so no content was getting updated. To make matters worse I could not actually request the same page twice after doing a post request, the whole thing just hangs up. This does seem to be a known problem, and there are some workarounds. The easiest one is to disable ajax forms in jqm. I did this, and then changed the form redirect to point to a jquery mobile url (the one with the hash). This seems to have fixed my problem in most of the cases, and also preserves navigation information somehow. Very nice. The only problem I have now is that in some cases the page history still contains a dialog page that shouldn' t be there.

I've been using jquery mobile for a couple of days now and that's really the only problem I encountered. Other than that I've been extremely pleased in working with the api. Functions are where you expect them to be, and all of the things I wanted for my website were already included. My hat off to you, jquery mobile. May you soon get out of the alpha stage and fix the dynamic page issue.

 

Posted in Tech | Tagged , ,

Reveiling a bit more about MediaList

What is about programming that makes it so fun to do at night? Or so terrible to do in the morning? Maybe I'm just a night person.

I ran into an uncomfortable realization yesterday while working on MediaList. Since I've switched from Java to Python I've focused on keeping my code clean, empty and generally sense-making. I decided to prioritize readability and cleanliness over performance, which is something I seldom if ever do in Java. Figuring that this is a hobby project I thought I might get away with it. The future will prove me right or wrong, but I'm starting to have my doubts already.

As a way of generating a large volume of high-quality content for my site dynamically I'm planning to let the site's users input URLs of other websites. MediaList will load the site, scrape all the relevant info from it and insert it into a database. I've already mentioned before that the site allows you to rate stuff, so here's an example: you can rate a movie by pasting a link from IMDb.com. MediaList will then fetch information from the page on IMDb, like the movie title, year it was released and its duration. I chose not to let users add this content manually for two reasons. One: I don't have users. (And I'm sure as hell not going to add all that crap myself.) Two: letting people input things manually will surely mean a lot of mistakes. With a large community that's not an issue as moderators will notice the mistakes and fix them, but with a small userbase (or none at all) it's just a lot easier to scrape the data from somewhere else.

Here's where I ran into a lot of problems. IMDb does not have an official API, and the unofficial ones don't have information about the IMDb URL  belonging to each movie, which is vital for my purposes. I decided to take a rather risky step and parse the raw HTML from IMDb directly. It's risky because it can change at the whim of the people at IMDb, and when that happens I'll have to update my parser. IMDb, if you're reading this, a public API (preferrably using JSON) would be awesome.

After messing around with Python's htmllib and sgmllib I realized that they both sucked, and that if I wanted to get something done quickly (in dev time, not in processing time) I'd need a DOM parser. After sniffing around the net a bit I quickly found BeautifulSoup, a wonderful piece of code that builds a DOM tree and provides search functions for *ML documents. The code I wrote using BeautifulSoup is easy to understand and easy to modify, quite unlike the turd I wrote with sgmllib.

Building a model of a single IMDb page in BeautifulSoup takes mere milliseconds on my system at home. The bottleneck lies in fetching the urls from IMDb. While the time it takes for a single url is acceptable, importing a list of several hundred urls takes painfully long, and is not something I can let the users wait around for. Fortunately I only have to lead each url once, after which the information is cached inside MediaList. If the information on IMDb was incorrect (which, after testing this feature, turns out to be the case more often than imagined), the information inside ML will also have to be updated. Manually.

Still, after all the work on the MediaList concept, this has been the only potential performance bottleneck I've encountered. I'm confident that, this point excluded, the site will still work fine up to 1000+ active users, even on shared hosting. For all I know the site might bog up at 50 users though, I haven't tested that yet. I do feel satisfied about the code I wrote though. It's always a trade-off. Pretty code, high performance, rapid development: pick two. I'm glad I chose a different combination for this experiment.

 

 

 

 

Posted in Tech | Tagged ,