Liminal Existence

Automatic Bootstrapping of Rev=canonical

Kellan’s work on making URL shortening not suck is great, but killing and tinyurl just isn’t going to happen. Sadly for Joshua, this is the way the internet works, and if he doesn’t like it, tough shit.

So far I’ve only seen one site that (awesomely) shortens URLs with rev=canoncial (I’m sure there are more, but I haven’t seen them. So there.) Simon Willison has done some great work on his blog, and throwing at Kellan’s shortener results in Brilliant!

Except no one will use it, because, well, is useful for doing things like tracking how far that link you sent got, and there’s a degree of muscle memory involved. This is the sort of vi-versus-emacs argument that just isn’t going to go away. Also, here’s the same post shortened by, tinyurl and this Simon-Willison’s-post-about-rev=canonical specific URL shortener that I just linked to in the previous paragraph. So why fight it? You really can’t win.

There is hope, though. I live in the dark ages, and Blogger publishes my posts as static HTML over SCP or FTP or some other totally inappropriate protocol. Since there’s no <$BlogItemShortURL$> tag in Blogger’s template syntax, I’m completely unable to do what Simon’s done, without migrating to a self-hosted blogging system (contrary to popular belief, not all programmers are compelled to write their own blogging systems (erm, on second thought, ignore twitter)).

Anyhow, it turns out that I can be as cool as Simon, just with one step of indirection:

<link rev="canonical" href="<$BlogItemPermalinkURL$>"/>
<link rev="canonical" href="<$BlogItemPermalinkURL$>"/>

Strike that, Blogger fucking sucks and so I’ve created a shell script that converts a placeholder into rev=canonical links. Man, that was a pain in the ass. Why do I still use Blogger? Anyhow, the point stands if your blogging software doesn’t totally suck and will give you a permalink anywhere in the template engine. Which is probably true unless you’re using Blogger. Ugh.

So this is great. Now my blog posts are rev=canonicalâ„¢ compliant, and I you didn’t have to change anything at all, beyond shoving a couple of lines of HTML into your blog template. People that use and tinyurl are happy, because they don’t need to change their behaviour, and people that use rev=canonical are happy, because they can just by following the links provided for them.

Now of course this doesn’t address two of Joshua’s concerns. First, I still have no idea where my traffic is coming from, because I don’t run my own URL shortener. I don’t want to run my own URL shortener. What I see here is an opening for and/or tinyurl to allow me to see the stats of redirects (Dear FeedBurnerGoogle: Please purchase or tinyurl or build your own), which they should be able to do easily since all I would need to do is prove that I own by sticking some secretly named file at the domain root or otherwise.

The second concern that remains unaddressed is what happens when the URL shorteners go away? Well, we have the same problem on the web. The answer to that was/is The Internet Archive, so herewith The (Tiny) Internet Archive.

Currently it’s just a proof-of-concept. I don’t guarantee in any way that the links posted there will persist. That said, the free quotas that come with Google App Engine allow enough space to store around two million links, and $0.15 per month for every additional two million links, so I’m sure it won’t be a problem. Who knows, maybe will take it over?

The code that’s up is available on github here: TinyArchive. If you have suggestions, please send them my way or fork the code and send me patches. It was just a pre-coffee morning hack, my first stab at App Engine, and my first Python code in what seems like forever.