Thursday, May 03, 2012

Analyst, measure thyself

The other day I had a nice example of my work failing that I thought I would share. At Shopify I have several models that predict all sorts of future behaviours of sellers, purchasers, signups, etc. Earlier this year there was a change in the signup population that dramatically reduced the effectiveness of one of these models.

What's interesting is I didn't discover this myself. One of our VPs did.

It is certainly never fun to have a co-worker discover that some of your work is inadequate. It is even less fun when that co-worker is a VP. But what was cool was that he found out by looking at a self-updating report; a report that I wrote.

What I had made was a web page that broke signups from a month ago up into five groups based on the model. The groups were ordered by what the model at the time thought were the chances that members of each group would convert into paying customers. For each group I listed the rate that each group actually did convert. In an ideal world the bottom group would have converted at close to 0% and the top group would have converted at close to 100%. Let's just say that when this VP looked at this page the conversion rate of the bottom group was substantially higher than 0% and the conversion rate of the top group was substantially lower than 100%.

I'd love to say that my initial response was to accept that my model was no longer performing. Instead I tried to explain why this report was misleading. I insisted that the situation was complicated and you couldn't just look at a simple table like this and understand what was going on. But it didn't matter because the evidence was too convincing and it showed that the situation was pretty simple. The evidence said the model must be rebuilt and the evidence had come from me.

Most analysts are perfectly comfortable with the idea that the best way to know if a marketing campaign is succeeding, or if a design change is making customers happier, is to measure the results and then try to prove to the opposite. And yet we often fail to hold our own work up to the same scrutiny.

With that in mind I recommend your modeling projects include these steps:

  • Have quality tests for all the models you create. The tests need to be simple and require minimal explanation. It should be easy for any other analyst to replicate your test just by looking at how you present your test report. When you are done a model you need to be able to say "this is what I have done, and here is my proof."
  • Have fit tests for all the techniques you used. For example if you are using a curve-fitting technique to predict the future, show how that technique would predict the present using past data. You also need to show that your selection model type is a good choice. Even if you've used SVM models to predict these sorts of results in the past you still need to show that SVMs work with this data.
  • Write your tests before you start. This way you can't later avoid making the tests that expose your weak spots. Also this helps you know when you are done. If your test results haven't improved in the last few days then your work has stopped accomplishing anything. Plus it is motivating to be able to see your test numbers improve as you work. I'm pretty sure test driven modeling has been practiced longer than test driven development.
  • Even your tests should have tests. They need to reconcile with your accounting numbers. If your test reports revenue from customers the revenue total should reconcile with your reported revenue numbers.
  • Your tests need to be able to alert someone if they start failing. I dropped the ball here last time. 
  • Make your tests public. This takes away your option in the future to decide that this test doesn't apply. More importantly, the users of your model are going to have a much better understanding of what it does if they can see its past behaviour. Even better would be to make your tests public from day one.

Note that these tests are not a substitution for model validation. There will probably be overlap in types of tests; but you still need to have a separate validation hold out.

Finally consider this: if the defense of your model is a description of the techniques you used then you're doing it wrong. The defense of your model is presenting its performance. If your model is useful and it performs well then people will ask you about your techniques. You don't show that a new glass technology is unbreakable by describing the molding process. You show it by giving a man a baseball bat and letting him try to break it. And if he fails, give him a bigger bat.

No comments: