Thursday, 3 November 2011

A 3-rectangle 17x17 grid

I've made no secret of my obsession with the 17x17 challenge. I started working on it in November 2009 and went straight at it for six months. At that point I had to stop to code/write up my PhD but started working on it again as soon as I was done. This problem has given me a reason to learn so many amazing things in both math and programming, that I would be happy to have worked on it even if I never produced anything worthwhile. It's now a weekend project given that I run a startup, but, after almost 2 years, I have something to show the world: A grid with 3 rectangles.

To be clear, this is not a solution. It would need to have 0 rectangles to be a solution. But it is the least broken solution I know of. Bill Gasarch posted with the problem a 4-rectangle grid by Rohan Puttagunta, but this has not been improved on since. Here is a leader board of the best grids known so far.

This solution is less impressive than that one in that it's missing two cells rather than one, but it does have fewer rectangles when extended to a full colouring, which is what makes it interesting. Without further ado, the solution, with the rectangles marked out:

If anyone has code they want to use to analyse this, here it is again in a machine-friendlier format:


In case anyone hasn't noticed, this solution is symmetric, i.e. colour(x,y) = colour(y,x) if you start numbering from the top left.

I have a lot more to write about the solution and method, and I'm not yet done with this approach, but if I start writing up I may never publish this, so I'll just stop here for the moment.

Sunday, 2 October 2011

An accidental survey of the Hacker News ecosystem

So my little rant a week ago made a bit of a splash, briefly occupying the top of Hacker News. Looking at the inbound traffic, what surprised me was the variety in the HN ecosystem, particularly the alternative front ends that people use to access the HN stream.

Writing a HN front-end is an uncertain proposition: You only get to talk to your audience once - at launch. You don't have a viral loop, so users will most likely be within that initial spike, plus whatever you can get from word of mouth. There are a few online listings, but then the question is how many will get to see those without a static link from HN. This is my attempt to bring a little more attention to these very cool but underappreciated projects.

What I saw in the logs, is that at least 3% of the HN traffic came from alternative front ends. The number may seem small, but zooming into those visits reveals the surprising variety of ways people consume Hacker News. At this point I'd like to apologize to the coders of native apps: I can't see you in my logs, because there is no referrer in the traffic you sent my way. I know that 20%+ of the traffic I got is unaccounted for (and it damn sure wasn't 3600 people dropping in to see what's on my blog), but I can only guess where it came from. With that said, here's the situation, if my analytics dataset is to be trusted:

The most popular alternative HN front-end is hckrnews by @wvl, sending 69 visitors over. hckrnews competes with the default front end head-to-head by offering multiple convenient ways to access the submissions. It is also well integrated with the Hacker News plugins made by the same author for Chrome and Safari.

iHackerNews targets an unmet need, namely the lack of an official mobile website. iHackerNews is built by @ronnieroller and also offers something else much requested of HN: an API. 51 visitors were iHackerNewsers.

The hardest to measure HN-alternative of sorts is @cperciva's Hacker News Daily. The idea is simple: A feed with each day's 10 most upvoted submissions. It covers the need to know that even if you're gone for a day or two, you can still see the best posts when you come back. The reason it's hard to measure is that only a fraction of people go through the website, that sent 31 visits. I would expect most to consume this through the feed (as I do too). Google Reader tells me that Hacker News Daily has 1061 subscribers. That's significantly lower than the 36.564 subscribers to the main HN feed, but I suspect readers are much more likely to be engaged, since they get only one feed item per day.

Another mobile-friendly front end is As far as I can tell it's been made by @JoelSutherland@KrisJordan and the team. While not as feature-complete as iHackerNews, I do prefer the look and feel which looks more tailored to a mobile device. 13 visitors came that way.

12 more visitors came through another mobile-optimized site, The selling point is the instapaper integration as well as its optimization for iPhones by using iUI.

For the fans of a more traditional reading experience there is Hacker Newspaper by Giles Bowkett. As you might expect, it renders the stories of the HN front page in a newspaper format. I imagine this would work quite well with a tablet. Hacker Newspaper sent 10 readers.

 An interesting take on the HN frontpage is, which allows you to sort by points, comments, domain, submitter, and age amongst other things. 4 hnsorters showed up.

Another 3 came via one of my favorites, Hacker News Reader, which is a serverless app. This means that you download a static file that when executed on your browser, fetches the HN frontpage, parses it, and presents it in a different way. No server is involved, which means that if login was implemented, your credentials would never need to touch the developer's server. HN Reader is also installable as a Chrome Web App. While feature-limited, it looks optimized for mobile and also rocks instapaper integration. Definitely one to keep an eye on.

Then there is the long tail: The alternative front ends that barely registered by sending a single user my way. Since there are so many projects in this list, I am sure I may have missed as many if not more that didn't send a user to my post. There's, the only project that explains its mission with a rage comic.

Then there’s  HN overload, which nicely organizes each day's top links in a clean format. It scores by aggregating hn points, reddit karma, and number of retweets. Definitely tempted to use this one more.

There's, which organises hourly snapshots of the HN front page using an etherpad-style timeline. Genius. promises to optimize your HN reading experience by allowing you to see both the front page on the left, the article page on top, as well as the comments page at the bottom. A very interesting experiment.

Another very clean and mobile-friendly site can be found at I like the minimalist well spaced layout as well as the refreshing blue colour scheme in a sea of orange competition.

As is obvious there is a lot of variety in the HN ecosystem. I hope at least one of these apps convinced you to give it a try, even if just for a day.

Besides the dedicated HN front-ends, I really should also mention the aggregators: websites that pull together an HN feed with feeds from other relevant sites. The main ones are,, and, in this order. Aggregators sent 94 user agents this way.

Another benefit of being able to dive into the referrer logs, is to see from what part of HN people came through. If you're wondering what that means, read on. There's /best, where articles rise and fall much more slowly, giving you access to great articles you may have missed over a few days of absence. There's /classic, which applies a different sorting algorithm. I believe it has something to do with usng only veteran users' votes. Then there's one I didn't know existed: It's /over. This allows you to set the threshold of points you want to see articles, well, over. Going to /over?points=100 will show you the latest(?) articles that have over 100 points. As for the two people who came with as the referrer, you sirs, are gentlemen and hackers.

Since we're talking data, another lesson I learned is that these 'follow me on Twitter' links you see at the end of some blog posts? They get clicked. I added one to my previous post a few hours after the initial spike, and I estimate it brought about 1.5 new followers per 1000 reads. Not earth shattering, but not too bad, either. 

What do those links look like? Well, something like this:

If you read this far, why not follow me on Twitter here.

Sunday, 25 September 2011

Why Facebook's 'Frictionless Sharing' violates HTTP

Facebook has this new feature, whereby the act of simply reading a web page, under certain conditions, gets it posted to your news feed, for your friends to see. Here's how ReadWriteWeb puts it
With these apps you're automatically sending anything you read into your Facebook news feed. No "read" button. No clicking a "like" or "recommend" button. As soon as you click through to an article you are deemed to have "read" it and all of your Facebook friends and subscribers will hear about it. That could potentially cause you embarrassment and it will certainly add greatly to the noise of your Facebook experience. 
Facebook calls this 'frictionless sharing'. This has raised all sorts of ‘creepy’ flags, and rightfully so. A big reason for this is that it breaks a fundamental contract of web interaction, in place since the beginnings of the web, that users have come to rely upon. This contract is the fact that merely browsing a webpage (Executing a GET in HTTP talk) should not cause effects that you, the visitor, are responsible for. Posting to your news feed is a side-effect, is a direct side-effect of your reading the article. You take no extra step to authorize this. 

This violates a convention that is not there by accident. The HTTP Specification defines GET as a ‘safe’ operation, with certain guarantees. This line has been skirted for a very long time, but never by a company of this size, so publicly, and so blatantly. This is what the HTTP Spec has to say on the matter: 
9.1.1 Safe Methods
Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others. In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". 
[…] Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them. (emphasis mine) 
I don’t think it gets any clearer than that. It’s as if the HTTP committee had looked into the future and was personally addressing Mr. Zuckerberg. Now, the HTTP spec has no teeth. There is no enforcement body that goes around and metes out fines and punishment to the violators. It is a gentlemen’s agreement and the contract that good citizens of the web should keep. As such, I think it merits at least a mention when large companies find new and ‘frictionless’ ways to undermine the foundation upon which they (and everyone else) is building on.

Update: A number of people are pointing out the fact that the user authorizes the side effects by installing the app on facebook. However, I assume Facebook also agrees to the HTTP Spec by implementing it. Does getting user authorization allow you to violate HTTP? I don't see any such language in the spec. I think the safeness of GET is one of those rights that you shouldn't be able to give away, even if you wanted to, as doing so undermines the web for everyone else.

If you read this far, consider following me on twitter

Sunday, 18 September 2011

Three times Google’s ‘strategy’ got in the way of success: Skype, GDrive, Google+

I just finished reading Sriram Krishnan’s excellent post 'Don’t be so f*king strategic' and couldn’t stop thinking that this must have infected Google as it had Microsoft.

Here are three times when the public learned of missteps of Google that were somehow related to a grand strategy of the company:


This is documented in Steven Levy’s book ‘In The Plex’ and the author has a more specific blog post on the issue. What is comes down to is this: Someone at Google thought, and was able to convince the higher-ups, that peer-to-peer is old technology, not consistent with their cloud model, so Skype was worthless to them. The fact that this someone was from a product group that would have to compete with Skype internally goes unmentioned, but what is important is to see that Google in 2009 passed up the opportunity to buy Skype for a fraction of what Microsoft paid for it in 2011. Skype is now integrated with Facebook.

GDrive / Dropbox 

Drew Houston worried in his YCombinator application that Google would launch GDrive any day. It turns out Google didn’t and Dropbox is a billion dollar company today. Why? In The Plex has this to say ( The Google Docs team was able to convince the higher ups that files didn’t make sense in the cloud. File systems were a thing of the past and so GDrive was abandoned almost complete, the engineers sent to work on Chrome. It turns out that files aren’t quite as dead yet, and Google Docs itself now allows you to upload them. Recent rumours say that GDrive may have been resurrected and is getting launched, this time in a much more crowded space with credible independent competition.


The latest news is that Google+ is showing signs of decline. The causality here is not as strongly established, but the early demographic has not been happy with their real names policy. When I found out they were enforcing this policy, I was in disbelief. Surely when you’re entering a new market, you want to be friendlier than the competition, you want to be welcoming to those who were disenfranchised from your competitor. Google proceeded to shoot itself in the foot by affecting other services that the blocked users were on, shutting out ethnic groups that did not have names with a western structure, people who are known by a name that is not their legal name, as well as those who preferred to remain anonymous for security reasons. The statements coming out of the GooglePlex were to the effect that 'Google+ is not for everyone', and that they can't fight all the battles all the time. This is bizarre behaviour on its face, and I've learned that when smart people behave in ways which appear outright incompetent, there's usually higher level considerations at play. It turns out that Google sees Plus as an ‘identity service’, a part in a grand strategy we’re not privy to (but can make guesses about). To put this in plain terms, Google is jeopardizing their bet-the-company move to attack a competitor because they have some masterplan that may or may not be what users want in the long run.

This is three times when Google let their ‘strategy’ get in the way of success as far as we know. I’m sure more are known to the insiders. I hope this has documented what I’ve been seeing watching one of my favourite companies display a fondness for footguns. Here's my unsolicited advice to Google: Stop being so f*cking strategic and just focus on building the world’s coolest technology, what people love you for, before you end up boring like facebook.

Monday, 22 August 2011

Solving Causes' Levenshtein Distance challenge in 28 seconds. (Python)

Causes has recently put up a coding challenge for prospective engineers who want to work with them. Here is their description, short and sweet:

Two words are friends if they have a Levenshtein distance of 1. That is, you can add, remove, or substitute exactly one letter in word X to create word Y. A word’s social network consists of all of its friends, plus all of their friends, and all of their friends’ friends, and so on. Write a program to tell us how big the social network for the word “causes” is, using this word list. Have fun!

The corpus size is 264061 words. A friend wrote a C solution that solved it in sub-300 seconds and challenged me to do better. Looking at it, it seemed like it would benefit from using Python's amazing set() data structure, so I gave it a shot. After some horrible bugs, I got the right answer: 78482 (including the original word 'causes').

My Python solution is in 28 lines, including comments. It also runs in 28 seconds on my antiquated Core Duo. Someone on Hacker News found that pypy also speeds this up by 70%. Not bad at all.

Other solutions I saw online take longer, which is why I decided to put this on here. Other attempts, (claimed runtime in parentheses): Ruby (50 sec), Ruby again (?), PHP (1.5 hours), Python (4 hours, 1 hour), Python again(?), Python redux(?), newlisp (?), C++ (1 min).

The secret behind my version's performance is Python's incredible set(), which does membership testing in O(1). I suppose the approach of generating all possible Levenshtein distance 1 strings and testing for membership also helped performance while keeping code size down. It's much easier to generate the small number of next steps rather than testing each of 260,000 options to see if they are a next step.

Without further ado, the code:

import string

w = set(open("00wordlist.txt").read().splitlines())
f, nf = set(), set(["causes"])

#from b, yield all unused words where levdist==1
def nextgen(b):
for i in range(len(b)): #for each index in b
for c in string.ascii_lowercase: #for letters [a..z]
if c != b[i]:
#substitute b[i] with c
if b[:i] + c + b[i+1:] in w:
yield b[:i] + c + b[i+1:]
#inject c before b[i]
if b[:i] + c + b[i:] in w:
yield b[:i] + c + b[i:]
#remove b[i]
if b[:i] + b[i+1:] in w: yield b[:i] + b[i+1:]

for c in string.ascii_lowercase: #for letters [a..z]
if b + c in w: yield b + c #append c after b

while len(nf):
cf = nf
nf = set([j for i in cf for j in nextgen(i) if j not in f])
w -= nf
f |= nf

print len(f)