November 2008

26 Nov 2008 [link to here]
echoes and explanations

I haven't posted here in months, and only ten times in the past year. I'm not trying to lull you into a false sense of complacency; your complacency is justified. Embrace it. I have, however, been posting lots of photos (here and on Flickr) and sharing links through Friendfeed (visible in a sidebar on my own site). I have been up to interesting things, but mostly not things I want to share with the whole world.

Yesterday, Rafe pointed out a NY Times article about the Netflix competition and called out this quote:

[W]hile the teams are producing ever-more-accurate recommendations, they cannot precisely explain how they're doing this. Chris Volinsky admits that his team's program has become a black box, its internal logic unknowable.

That's a little misleading. The programmers do know the details of how their programs work, but those details are a collection of statistical optimizations that don't neatly follow how people consciously make or explain their own decisions. That makes the overall effect unintuitive, even though the internal logic itself is known.

It reminded me of some work I heard about in grad school, presented by a guest lecturer in my machine learning class (whose name and affiliation I sadly don't recall). He'd worked on a system that made medical diagnoses. You enter the patient's symptoms and perhaps other personal data, it tries to guess what's wrong with them, based on data from previous cases.

They'd made two systems. One used k Nearest Neighbor (kNN), which works by finding the most similar prior cases (k of them, where k is some relatively small number you like) and guessing that the case in question is probably like those cases.

The other system used a simple neural network, and it was more accurate. The problem was that doctors didn't trust the results of any automated diagnosis and wanted the computer to make a coherent argument supporting its decision. The neural network could explain itself, but not in a useful way. ("I think the patient has pneumonia, because when you add up these fifty thousand numbers using these weights that I computed from your training data, then combine them in this way, you get the following matrix, which ...") The kNN system could present the similar cases it found, which was easy to understand, but its predictions weren't as good.

The dichotomy is mildly interesting, I guess, but the fun part is how they solved it with a hybrid system.

First, a little background. Their neural net was a fairly common topology: a feed-forward network with a single hidden layer with fewer internal nodes than inputs or outputs. The inputs might have been the patient's symptoms (coughing? feverish? blue?) and background (age, sex, personal or family history of various things), and the outputs could have been likelihoods of various diagnoses, the highest of which would be the system's best guess. For any given set of inputs, the activation pattern of the hidden layer can be thought of as a summary of the case. It's not a summary a doctor would want to read, but it's a projection of the data into a lower dimensional space that simplifies the data in a way useful for making predictions, similar to how people are using Singular Value Decomposition in the Netflix contest.

And now the clever trick. They used the neural net to make the diagnosis, and used kNN to justify its prediction, but instead of finding cases similar in the high dimensional space of the inputs, they ran kNN on the hidden layer activations of all the training cases. The system couldn't explain in a non-numerical way how those particular cases were similar to the one in question, but generally it didn't have to, because the doctors could see the similarities for themselves once the relevant cases were pointed out.

In one sense it's just dimensionality reduction, which is something you always do with kNN if you want decent results, but the neural net was probably still more accurate than kNN operating on its hidden layer, and I thought it was clever to use one algorithm for the prediction (the primary objective) and piggyback another on its internal state to argue for its validity (a related task that turned out to be required). It's a bit like handing your brain to a friend so he can explain something from your point of view.