3 posts tagged “machine tags”
A few weeks ago, when I was finally prompted to write up my EXIF to machine tags script, I parenthetically remarked that
ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for 'things you'd like to see in Flickr' if I'd felt able to get away with being so demanding
Funnily enough, a mere week after posting that, Aaron Straup Cope posted to the yws-flickr group, announcing exactly what I'd obliquely asked for: methods to work with the parts of all machine tags on Flickr. I set to work, and by that weekend had produced a machine tag browser.
Thanks to some coding help from Tom Insam and suggestions by Ryan Gallagher, the currently live version is a fair bit nicer than the initial version. The code is still a bit of a mess internally (there's far too much repetition), there are some bugs (values with full stops (or decimal points) in particular), and I still have three items on the TODO list.
Despite this, it's still sufficient for users to see that the astrometry.net system has been able to solve about 85% of the images it's processed; that three images have had an ImageMagick Lomo effect applied before upload; the names of Len Peralta's monsters by mail; and where people take screenshots in Second Life. In fact, I've been pleasantly surprised to note that the code.flickr blog mentioned it when Aaron launched machine tag heirarchies to the wider world.
As it says on the browser itself, the source code (all the clever stuff is in JavaScript) is available on github, and I'd love to recieve fixes, changes, or requests. In the meantime, have fun looking around.
Last week Kellan from Flickr published my interview on code.flickr. I'm still somewhat amazed that they chose me to ask, but then I'm also pleased at how much people are liking snaptrip, and I'm happy to see my words in print, as it were.
I actually compiled my answers a couple of weeks before it was posted, hence the reference to groupr as a "lost project". Now, of course, it's back, but I've already posted a couple of times about that. What I would like to do is - finally, and belatedly - document (and update the released version of) my EXIF machine tagger.
Why bother with such a thing? Flickr will extract EXIF metadata, but it won't allow you to do any aggregate queries on it. (Well, that's not quite true; at dConstruct 2007 Tom Coates leaked some URLs which I picked over, but they don't cover all the useful things I'd like. Plus, it's not documented.) By extracting all the data from my photos into machine tags (and a local SQLite database), it becomes possible to point people at all the photos taken at the wide end of my widest lens, or those taken with a particular make of camera (and to do more complex queries locally).
With that out of the way, how do you go about such a thing? Well, as usual, it's actually a fairly simple joining operation. Get a list of photos, and for each of them, get the EXIF data (using flickr.photos.getExif), then store the data locally, and add tags back to Flickr. There's not much munging invovled - I convert spaces in the EXIF field names to underscores, and some things get put in the "file:" or "camera:" namespace, rather than "exif:" - so it's all pretty straightforward. (I do preserve spaces in the EXIF values, though, by quoting my arguments to the addTags method.)
I also add an meta:exif field with either "none" or the epoch seconds of the time of tagging, so that it's easy to exclude previously-tagged images from being examined again. Another minor niggle is that, to add tags, a script has to be authorised. I copied the code chunk from the flickr_upload script in a Perl module, and it seems to work for me.
However, the fact that users need to get an API key, secret, and then a token, is naturally going to limit the audience for such a script. A few other users have metadata in the "exif:" namespace, but it's not exactly common. It's hard to turn the script into a web app, too, since it needs about a second per image to run, and the first run has to examine your entire library, which these days is typically thousands of images. I may still do it, but I haven't bothered for months, so I wouldn't count on it.
Another drawback is that machine tags are normalised at Flickr. This means that when I query on exposure bias, both -1/3EV and +1/3EV show as just "exif:exposure_bias=13ev". I've been thinking about ways around this - by querying raw tags - but it's not straightforward. (Ways around this normalising, and ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for "things you'd like to see in Flickr" if I'd felt able to get away with being so demanding.)
One final observation is that the script's in Perl, and uses XML (which is, apparently, sometimes compressed at Flickr's end; at least, I had to add Compress::Zlib at one point for some reason). If I was to redo it, either in Python or Ruby, the data would all be fetched as JSON, and it'd probably get a few more users. Ah well. Installing the prereqs shouldn't be too hard.
That said, of course the script, as is, proved useful. I run it manually after an upload, while Tom, who is (as ever) a bit more sensible, has his fork running as a cron job. Either way, please download it, play, and feel free to let me know what you think.
Having finally got snaptrip out there, I'm hoping you'll allow me a little (pretentious?) waffle about why I wrote it, where it fits, how I made some of my decisions, and what's next.
I'm a big fan of Flickr's machine tags. Most of my images have at least ten - mostly generated automatically, like my EXIF machine tags - and I tend to add geographic metadata as well. As such, it's probably not a surprise that I'd write an application that made Dopplr trip IDs available. The big surprise is that I bothered to make it accessible to most people, by building it as a website not a script.
Why a website? Well, I thought I'd like a nice interface as much as anyone, and I also know that to make a machine tag truly useful you need as many people as possible using it. Asking folk to download a script, get a key, and use a command-line interface - or no interface at all - isn't going to work.
Speaking of Dopplr, I don't think I've seen a talk by anyone there since it started, but I do think I've picked up their philosphy from slides and abstracts online. The phrase that tends to crop up is a "coral reef", the idea being there's a web of data that's available on the internet and that by doing one thing, and doing it well - the old Unix philosophy, really - that you can live in a happy niche. Well, snaptrip lives on part of the coral built by the two companies whose API it consumes.
I'm not under any illusions: it's likely that most users won't care about their past trips, or matching their Flickr photos. Those who do will probably only visit the site once, tag a few trips, and then leave. That's fine.
In my previous post I alluded to some decisions I made about the geotagging features in snaptrip. To be honest, it wasn't something I'd considered at first, but seeing Richard Crowley's Dopplroadr hack - which does some of the same things as snaptrip, but when they're uploaded rather than by looking for existing Flickr photos - made me consider the possibility. However, because I am looking at things that have probably accumulated metadata already, snaptrip is careful not to overwrite any information that's already there.
snaptrip adds fewer tags than Dopploadr. It won't add human-readable tags at all, and it adds the geographical data at a relatively low level of accuracy. I didn't want snaptrip to assert with precision that all these photos were taken dead in the centre of Copenhagen, since they probably weren't. My US trips show exactly the sort of thing I'm talking about: most of my pictures are actually taken anything from ten to two hundred miles from where Dopplr thinks I was staying. Similarly, it doesn't set a woe:id machine tag, instead preferring to use the dopplr:woeid namespace/predicate pair.
It's quite possible I'm overdoing the paranoia here, and so I'll probably add the option to set more tags later, but for now, I'm happy to tread lightly. (In that vein, snaptrip doesn't set a visible "snaptrip" tag, like many apps (Shozu and AirMe spring to mind; Picnic also suggests adding its tag). However, it does set a dopplr:tagged=snaptrip machine tag, and I should probably make that optional also. For now, you can use Flickr's tag tools to delete it.)
So, what's next? Well, the basic functionality I wanted seems to be there and stable, so I'm now considering two further avenues. I'm trying to develop tools to give you some views on the aggregated data from your past trips, but perhaps I should instead be looking at tools to increase the amount of stuff in that Dopplr history. I've got a couple of ideas...