12 posts tagged “development”
Aral Balkan is trying to run the website for his conference on Google App Engine, the same platform that snaptrip uses. In October, he posted twice on Twitter:
I'd also noticed this, because the snaptrip login page (which does double-duty as an FAQ and news page - maybe I should rename it?) pulls in entries tagged 'snaptrip' from the Atom feed of this very weblog, and after my third post it failed to update for a good half a day. I wasn't that bothered, and didn't bother to double-check the documentation, which does clearly state thatGreat, you have no control over how Google App Engine caches data requests. Pulling in RSS feeds? Forget about it! (It uses Google Proxy and you can't tell it not to cache a feed or set the cache duration.)
Evidently this is the same problem as Aral has, and as usual, Tom Insam had an answer. It's from a slightly different direction (working with Google's Open Social containers), and as he said, it's what "everyone has done for years to bust caches you don't control": append an incrementing (or random) parameter to each request, which should mean that you're not hitting the cache. Having finally written a new blog post about snaptrip, I can confirm that this approach works. I'm not sure I'll leave it in - it seems a bit rude - but if timeliness is important, you might want to do the same.App Engine uses a HTTP/1.1 compliant proxy to fetch the result.
It also occurs to me that, if every call to urlfetch is cached for some time, then you may find that repeated calls via API libraries might give somewhat unexpected results (although they're more likely to have changing arguments, anyway). Be careful out there.
Last week Safari 3.2 was released, with the usual minimal release notes: "This update includes stability improvements and is recommended for all Safari users." The security notes were somewhat more forthcoming, but even there, not everything is covered, for as well as bug fixes, 3.2 quietly added support for two big security features: EV SSL, and Google Safe Browsing.
Neither of these changes, obviously, is covered in the release information, but since the (very good) MacJournals writeup of details of the anti-phishing features was reposted at Macworld, there's been a small whirl of further commentary, especially as the latter includes data collection for Google. Most of the (sensible*) concern has been raised because Apple's terms and conditions, unlike those of Firefox (who also use the Google Safe Browsing API), allow Google to make use of the data sent as a result of surfing using this plugin for any purpose, not merely enhancing that particular service. This might not be so bad if it wasn't also for the fact that the Safe Browsing checks fetch and send data by default.
Personally, though, I can't say I'm bothered by either of these. I'm sure Google get far more useful information from searches and opt-in service usage than they get from partial hashes returned when browsing to potentially hacked sites. As for defaulting to using the service, well, both Chrome and Mozilla also do that, and as with Firefox, Safari offers a preference to disable phishing detection.
What is more surprising to me is that so few people have connected the release of 3.2, and its emphasis on security over features, to the removal of Safari as a "safe" browser from Paypal's list in February:
"Apple, unfortunately, is lagging behind what they need to do, to protect their customers," [PayPal security chief] Barrett said in an interview.
I have little doubt that there's been behind-the-scenes back and forth between PayPal, and similar organisations pushing these changes, led Apple to release this sooner rather than later, in the 3.0 branch (rather than waiting for Mac OS X 10.6 and Safari 4.) Perhaps a more sensible place for people to raise questions is whether EV-SSL and Safe Browsing are actually useful, or if they're merely security theatre? Now there's a well-researched comment piece I'd like to see.
* There's also a lot of kneejerk "OMG Google haz my datorz!" nonsense, but reading the article makes it clear that only hashes of URLs are checked, and even that's only when a partial hash is matched against a hash of your current URL.
groupr (my little JavaScript application that gives users an overview of their Flickr group membership) needs to be able to communicate with Flickr. That's really not hard; getting the most recent public photos posted by a user can be done trivially, either using feeds or the API proper.
However, most of the calls that you need to write really interesting applications require authentication, so that they can see private data. Rather than use the password antipattern, Flickr uses a well-thought-out multi-step system. Unfortunately, this can be a bit tricky to wrap your head around, and harder still to debug. It was certainly something I spent a while grappling with for groupr. That's the main reason I've split out the parts of groupr that talk to Flickr into a library on AppJet called lib-flickr-minimal.
As the name suggests, the library doesn't actually do that much. There are methods to handle the steps of authentication, and there's a generic function to call any Flickr method. However, it's more than enough for me to write both groupr, and a little demo application that guides other users through the process of handling authentication.
(A little on that demo application. I spent a few minutes trying to think of a method that required read privileges that would not be too obvious and dull ("you have 500 private photos", for example). Thankfully I remembered the recently-launched flickr.places.placesForUser method, and so I decided to use that as my example call. A bit more work meant I could plot the places returned onto a Google map, so now you can see where you've taken (or at least, geotagged) the most photos.

Ideally I'd rewrite this to produce something prettier, like Dopplr's lovely raumzeitgeist images, but for now, it's a nice little one-page example.)
Philosphically, I prefer this style of library. There seem to be two schools of thought when it comes to building such things. You can tell from the source of the library that I'm in the "least possible work" camp: provide helpers for the functions that are tricky, but for most calls, let the user consult Flickr's documentation to figure out what to call, and use JSON as a return format to make everything that you get back an object (or at least, a rich data structure).
The other camp, which I think of as being influenced by Java and other less dynamic languages, wants to provide a method for everything. As a result their implementations tend to have lots of boilerplate code for handling every single Flickr method (there are about a hundred now), and more for parsing the returned XML (rarely, if never, JSON) and add to it convenience methods for such things as constructing URLs.
While the latter style is probably superficially appealing (you get documents in one place, and the library can error-check locally) it also has significant drawbacks. When Flickr add a method, or extend the returned data, the library has to be patched and re-released. Many libraries only implement the methods of interest to the author, leaving chunks of the API unimplemented. (These are particularly annoying for me; they tend to implement flickr.photos.search, which seems to be the cornerstone of the Flickr API, but ignore the interesting methods around the edges, which I seem to be drawn to.)
There is a nice middle way, which is to use metaprogramming and the API's own reflection methods to construct a list of allowed calls and arguments, giving error-checking but also updating automatically when Flickr add methods. The libraries I prefer for both Python and Ruby do this, and very nice they are too.
To be honest, this is probably where I want lib-flickr-minimal to end up, but for now, I'll happily take a library that stays out of my way rather than one that aims to do everything but only implements a few things. Hopefuly others on AppJet, or those looking to implement Flickr authentication, will find it useful too.
Long-time readers here may remember groupr. (If you don't, it was a small web application that loaded the photos in your Flickr groups, something that, oddly, you can't do on Flickr itself.) I wrote it at the beginning of 2007 for Fotango's Zimki platform. Of course, when that died at the end of last year, groupr vanished, but not before I took a backup of the code and templates underlying it, in the hope that one day I might be able to revive it.
For a few different reasons, I've been considering bringing groupr back recently. I could use Google's App Engine, as I've done for snaptrip, but that was from scratch, and for this project, I didn't fancy porting both the code and templates. I had a quick look at Helma and Trimpath, but I didn't get on with either of them. There's also the fact that they they're not hosted solutions, and part of the joy of server-side JavaScript (SSJS) is not having to worry about finding a server. I also tried Reasonably Smart, but you have to be pretty clever to get git working, and I couldn't, so that was out.
Eventually I found AppJet, and after a quick look I was convinced that this was probably a good place to end up, and after about eight hours to port what I had, and another five or so to fix up some things I never quite polished off on the old version, you can now use groupr.appjet.net.
So, how does it compare to Zimki, and how hard was it to port the code? (After all, big names are now talking about portability in the cloud). Well, AppJet may be closed source, but they offer a downloadable JAR which ran without any effort for me on Mac OS X, meaning both that I could develop locally (even offline, with cached data), and that if AppJet vanishes (which, after all, happened to Zimki) I can take groupr and run it on a server of my own. In this case, practicality trumps theoretical openness.
AppJet's IDE feels a lot nicer than Zimki's did (although I barely use (or used) either, preferring BBEdit with AppJet's JAR, or Trawler for Zimki). I also approve of the way that libraries are handled (they're just apps whose name includes the 'lib-' prefix) is pretty nice. You can see what is using a library and there's provision for inline documentation too. The community feels bigger than Zimki's ever did (although that might just be because the idea of SSJS is taking off), and I was able to find a few useful libararies (such as a TrimPath template port) pretty easily. Speaking of libaries, AppJet's 'storage' is oddly non-core, but it's a pretty nice row-style store with nice querying facilities. It lacks Zimki's handy "expires:+2h" syntax, but that wasn't too hard to fit in myself.
One definite annoyance I have with AppJet is that they don't keep all their libraries out of the global namespace. Zimki's functionality was all hidden in a zimki object, but AppJet has a few top-level standard libraries, and 'page' and 'response' both clashed with names I was using in groupr's previous version. Another is that there's no way of handling non-JavaScript files, so both static files and templates are tricky. I've ended up with the former being hosted on my main server, and the latter as a hash of triple-quoted strings (a Python-ism that AppJet has imported into their JS runtime). Proper file support, like Zimki had, would be a boon there. However, both of these were pretty easy to overcome, and it turned out Zimki did very little that AppJet couldn't replicate. (Replacing the (Mojo, I believe) API calls was four lines of jQuery; replacing the server-side API cleverness, for my needs, was a few lines of JSON.)
Overall, then, I think I'm pretty happy with my experience so far. I've managed to revive the project without too much hair-pulling, and, as I said, even extended it from the state it was in on Zimki. Maybe server-side JavaScript has a future after all?
Previously I've done a couple of braindumps of blog posts I should have written. This time, they're site ideas, and I don't have the time to do them. Hell, I'm not even getting the half-finished features on snaptrip done; too busy putting in silly easter eggs. (At least people like the look of them.)
- twitter links to delicious
- especially for Siracusa and Gruber. Man, those two post so much.
- twitter link parser
- you know, if they had OAuth I'd be a lot happier about this one
- delicious network same-link collapsing
- when five people all link to the same thing, maybe it's really important
- so don't show it more than once
- delicious note detection
- show on a site the bits that people who leave notes on delicious choose to comment on
- xml-rpc to tumblr post bridge
- for flickr blogs, and the like
- auto-stream
- I'm sure I've mentioned this before
- use Google social graph APIs to fetch all the URLs for a user
- put out a stream like adactio's to show their activity
- but do it all magically!
Part of the point of jotting these all down is that execution > ideas, so if you want to take any of them and implement them, please do.
Earlier this week, I published a post entitled Hackability: Gecko vs WebKit. Actually, it sort of snuck out in the middle of the night; as is my wont, I left it as a neighbourhood-only draft overnight, only to find that Simon had turned up and made some of the points I was going to go back and edit in as a comment. Since nobody ever reads comments on the internet, and also since Simon, Tom and myself had a bit of a discussion on the subject on IRC, I thought I'd follow up the post with some further thoughts.
Firstly, Simon made further mention of Gecko 2, the next-big-thing coming from Mozilla. It does sound like it's going to be a great deal of work, and that it's probably dumb for them to be gung-ho about fixing some of the Acid3 issues in the 1.9 branch when 2.0 is around the corner. On the other hand, the very fact a 2.0 is needed is a bit worrying, and I'd also argue that perhaps there should already be a code branch being worked on. In comparison, I strongly suspect that the WebKit team have been lucky enough to time any major reworkings of their codebase to be out of the public eye.
Secondly, Gecko and WebKit both turn out to be about the same age- nearly ten. Admittedly, KHTML was initially much more minimal¹. On the other hand, this turns out to be key to the point I really should have made in the first post.
One of WebKit's stated project goals is hackability. In contrast, Gecko's layout engine doesn't really have a project page; the closest thing there is to one admits that
much of the code for the cross-platform toolkit is mixed in with this code, it is described elsewhere
This is really, I suspect, where the two projects diverge. WebKit is designed as an embeddable renderer. It's pretty modular - one of the KHTML legacies is that the JavaScript implementation is separate, as you can see from the fact that Quartz Composer embeds JSCore but not WebKit. It does one thing, and does it pretty well. (The same is somewhat true of the most common browser using it, Safari.) Given this, it's no wonder that it's adding features and passing tests at a rate of knots.
In contrast, Gecko is part of Mozilla's platform. It's a fairly major part of XULRunner, a cross-platform application framework that hosts Firefox, Thunderbird and other apps. That makes it big and gnarly, and probably makes it harder to work with, but it also offers a great deal of flexibility. Firefox is so extensible because the UI is itself written in a markup language, and it can be modified and extended without learning (too much) new stuff. In contrast, there are no supported ways to extend Safari's interface².
So far, I think we all agreed. Different goals lead to different priorities, and as Simon put it:
Teams focus on different things. saying "Well it's Apple's policy not to do plugins" is like saying "It's Mozilla's policy not to spend resources on what could quite reasonably called a PR exercise" but, as we know from Perl, PR exercises and easy hackability keep a project alive.
Where we parted company was that Simon argued one of the keys to the success of Firefox was this extensibility. Since it comes at a cost, it would be good if one could argue that it had also had a benefit. Unfortunately, personally, I doubt that the success it's managed against IE (which, compared to the market share of its prehistoric ancestor, Netscape Nagivator, isn't actually that good - although numerically I'm sure Firefox has more installs than Navigator ever managed) has anything to do with extensibility. Do real people install extensions - even lauded ones like Firebug and AdBlock? Somehow, I doubt it.³
No, the success of Firefox is down to offering a familar enough UI (Opera fails here, however compelling the features are once you get through to them), free (again, until recently, an Opera weakness), and with better security than IE (perhaps less so now, but older versions of Microsoft's browser didn't get their reputation for nothing). That's not to say that the work on XUL isn't good - extensibility helps get geek mindshare, just like passing Acid tests - but again, I do wonder if it's costing more than it's worth. After all, if easy hackability keeps a project alive, what does a lack of the same do?
¹ A fair chunk of the discussion was about how KHTML and WebKit were related. I don't think it's particularly relevant here and nobody bothered with the relevant Slashdot forensics, so I'm going to skip it.
² Hopefully that wording is clear enough to be understood as meaning "things that aren't browser plugins". I believe that none of the other WebKit based browsers offer much of a UI customising API either.
³ I'm more willing to believe they install themes. Shudder.
This post has been brewing for a while, but I've been prompted to actually write it by seeing John Gruber's offhand remark on his most recent linked list entry, about CSS gradients in WebKit:
No, it's not just him. WebKit, and Opera's layout engine Presto, raced towards Acid3 compliance in March, with both effectively reaching a photo finish on the 26th. Meanwhile, Microsoft hasn't even shipped a non-beta Acid2 passing browser¹; no surprise there. But where's Gecko, the Mozilla layout engine, the one that powers Firefox?Just me, or is WebKit racing way ahead of Gecko in terms of support for cool new stuff?
Well, to be blunt, it doesn't look as if they care much. We have one developer saying that Acid3 is basically worthless, and another (more diplomatically) stating that it's a missed opportunity and an exercise in making browsers jump hoops, rather than improve "real" functionality. As others (almost certainly more qualified than I am) have noted, this sounds a lot like the noises from Microsoft around the time of Acid2's release.
The thing is, I'm not here to kick Gecko, but to understand its problems, if it has them. Does the team's response to Acid3 mean it does? Possibly not on its own, but coupled with events like the move of Epiphany to WebKit², and the aforementioned speed of development on WebKit (and to a lesser extent Opera's Presto³) I have to wonder. Why is development there so slow?
One stated reason is that the Mozilla Foundation is on a rush to release Firefox 3, and it's certainly true that it is coming up for release. On the other hand, Apple certainly seem to be able to keep the open-source WebKit tree distinct from the version used in releases - Safari 3.1 shipped with an Acid3 score of 75 when nightlies were scoring 90-odd - so I should hope that's not the real reason. Maybe they're pulling people off the layout engine to work on the browser? That's not as stupid as it sounds for most apps, given the way the Firefox UI is set out using XUL, a markup language. Even so, it feels like a bad use of engineering. Maybe Gecko's reached that point where extending it's no fun. The language the team themselves uses, with talk of Gecko 2, makes me wonder if that's true.
I don't have answers, anyway, but I'd love to hear from people who do why Gecko is giving the appearance of stagnation, while WebKit seems full of life.
¹ Bafflingly, it seems that Microsoft develops not one, but four layout engines: Trident, for IE/Win, Tasman, originally for IE/Mac and now part of Office:Mac, and two unnamed engines, one in Word and Outlook 2007, and another in Expression Web Designer.
² Admittedly that's got contributing factors beyond merely Gecko; it sounds like the wrapper they were using to embed it (GTKMozEmbed) had some seriously nasty issues of its own.
³ I mentioned to Tom Insam that I was surprised I'd never heard of the name of this engine, but he sagely noted that, as it's not open source or embeddable, there's no reason I would have.
A year or so ago, whilst writing groupr (RIP), I came up with what I thought was a useful name for something I found myself doing a lot: the API join. I'm fairly sure this is common to a lot of Web (2.0) APIs, but it's especially common with Flickr. For example, take groupr. First, it would do a call to get the groups you're a member of. For each of these, it then fetched the photos in the group. Obviously, this has a problem: as the number of groups you're a member of goes up, so does the number of calls to the API - and each call takes about a tenth of a second. The only way to mitigate this, and the solution groupr used, was to page the groups - and even that leaves you making as many calls as you have groups on the page.
The problem reared its head again when I was looking at doing a ffffound-inspired Flickr favourites app. I wanted to display the usual Flickr size, rather than square thumbnails (as Flickr's own favourites page does). Unfortunately, the standard call to get favourites didn't list the size of the photos, and I really didn't want to spend two seconds fetching them all. Other people have raised similar questions on the Flickr API group discussions; for example, here's one about getInfo and getExif, and here's another about getting photo sizes.
Imagine my surprise, then, when I looked at the documentation for flickr.photos.search and noticed a new argument to the "extras" parameter: o_dims. It turns out this returns the original height and width, and is also available in the favorites methods, so now it's possible to avoid doing those calls, and to embed derived height and width for web-scale images from a single call, even for the 36 or so images on Flickr's version of the page.
Of course, this is simply because the API has now moved the join deeper; instead of being at the API level it's being done inside Flickr (presumably at the database level). In fact, I suspect that last weekend's database downtime may not be unrelated (perhaps it was needed for the launch of Apple TV's Flickr slideshows?). It also doesn't help with the other methods, such as getExif (there's a reason I've moved some of my EXIF data to machine tags, which are fetchable with another extras parameter to many calls).
Facebook, interestingly, allows a SQL-like query language as part of their API access, but I wonder how they deal with queries that could bring the database to its knees. I do notice the line
In order to make your query indexable, the
WHEREshould contain an=orINclause for one of the columns marked with a *.
Is that an enforced criteria, or is it merely a recommendation, and do they return long-running queries without results to keep up database performance? It's the sort of thing I'd love to see Flickr add to their API, but I can imagine the problems are far from trivial, and in the meantime, I'm very happy to see one API join bite the dust.
A couple of notes on my previous posts. groupr's had another quick release. Just after cloning it to the live domain last night, I quickly took out the obsolete (and distracting) toggling of group visibility, which Gareth had suggested on Wednesday. Tom persuaded me to clone it across this morning, so there it is. I'm still aiming to fix up the HTML this weekend, if I can get TT style modifiers to work in Trimpath templates.
Tom's also to blame for the other update. Apparently, while playing with Pipes, he observed a stack trace mentioning XML::Feed, the nice Perl module for abstracting RSS and Atom data, which led him to find my observation that there was no nice abstraction even more baffling. Oh well; hopefully that means it'll be easy to fix in a new version.
For the last couple of months I've been sitting on a CGI script that would aggregate all my content for my personal site. Part of the reason is caching: the script doesn't have any, and it turns out that XML::Feed isn't Storable friendly, which knocks out my first approach.
So when I had a think about what Yahoo's Pipes promises, I thought it might be worth a look. I could get the service to do all the heavy lifting, hope they had a sensible caching policy (and if not, well, at least it was Someone Else's Problem), and then just format a single RSS feed locally.
Sadly, there's a major problem. The aforementioned XML::Feed Perl module does a wonderful job of hiding the mess of formats that labour under the acronym RSS and the name Atom. If you want to soft by date, you can do so easily. (In fact, you get lovely Perl DateTime objects. I can't sing DateTime's praises enough, even if it does look daunting at first.) Pipes, however, doesn't. I can sort my Vox Atom feed by its pubDate property, or my delicious and husk RSS feeds by dc:date, but neither sort has a date format in common, so I can't sort them once they're output.
I had a quick look to see if there was an obvious way of doing a date transformation on an element of an item, but unless I'm missing something it's far from obvious. I could write a small web service and call it, but that's a lot of work, and I might as well do things locally if I'm that bothered. So I've given up, but not before writing this, because it seems like a natural thing to handle in such a high-level environment, and I'm surprised they don't.