Saturday, December 10, 2011

Churn rate definition

The other day I wrote a post on the Shopify blog on the definition of churn rate. I won't copy and paste it here but check it out if you're interested.

Thursday, May 19, 2011

Your n is probably a lot smaller than you think

If you have had a conversation with me about statistics you've probably gathered that I am a little hostile by the frequent use of p-values. This is partly because it is very easy to confuse their meanings. And I'm very happy that the Bayesianists are out there are fighting this battle and trying to get people to accept something better. But this isn't my real issue.

And there's the issue that many people aren't reporting adjusted p-values when using data mining techniques which make their results appear a lot more significant than they are (as brilliantly illustrated by xkcd). This is a major problem. But it's also not the problem I'm going to talk about here.

My issue is that even in the simplest of A/B tests that you may be running, your p-value is probably much less meaningful than your stats class taught you.

Consider that you are working for Blinko Laboratory supplies. You are trying to perfect a new automatic diagnosis device that detects the presence of a particular bacteria in blood samples. Right now you are testing a series of code patches to the detection algorithm to see if you can improve your accuracy. So you start off the device with 1,000 positive samples and it misses 156 of them. Pretty high failure right but that's why you're trying to improve the situation.

So you put in your first code patch that uses a fancy new elliptical curve technique for pattern detection (are elliptical curve techniques still the hotness; or am I dating myself here?). You put through another 1,000 samples and now the device only misses 104. Sweet! I mean you're still too high to release to the market but this result has a p-value of 0.0003 (using a quick normal approximation and the sample means to compute variance). You must have done something. So you apply the next patch. And the error count goes up to 150. Shoot, that was a bad patch. The p-value of 0.001 suggests this sway wasn't random noise. This patch must have caused harm. Well, you undo that last patch and your error count drops to 141.

Wait, what? The statistics are suggesting that it is far more likely that your last patch never got undone. So you start debugging and you notice that the device has been disconnected from the network the whole time. But how is that possible?! Surely something changed between your first test and your second test. As well as between your second test and your third test. You have the p-values to prove it. These p-values must guarantee you something, right?

But here's what you didn't know. Every 200 trials the machine performs a quick self cleaning and scrubs its lens. But this process has a 20% chance of leaving a streak on the lens. When there's no streak your device is able to detect 90% of the infections but when there is a streak it only detects 60% of the infections (the numbers used were generated using this distribution). So while you thought the proper n to use for your calculation was 1,000 there was also this other n of 5 hiding in the machine. Not to say 5 is the proper n to use either, you actually have a completely different distribution. (Is this a convolution? I can never get that definition straight.)

Sure, there are simple techniques that could have caught this. But how often are these actually done? I know I can check if my errors are auto-regressive in serial testing? How often do I do it? How many simple techniques do I need to make me impervious?

Clearly what I'm pointing out here is nothing new. Some people call this systemic noise. Or this is the issue of assuming independence. This isn't exactly a Black Swan I'm talking about here because this isn't a low incident high impact event. This is more like a medium incident medium impact event.

But think how many hidden low n's are affecting the results in your business. Maybe you are in the credit card business and you want to forecast how many proposals you will receive in a month. How many TV ad buys do you think are made in Toronto by credit attorneys? Under 20? What would the impact be on your Toronto proposals if their were two more?

Or you are trying to predict traffic to your website. How many blog posts are written on your site each month? 10? What if there was one less? (And surely you know that the number of blog posts written about your site isn't Poisson.)

This isn't to say you shouldn't have any understanding of the naive p-values implied by your results. You should be able to recognize when a result falls with in your statistical noise. But please don't spend a huge amount of time trying to sharpen the precision of your p-value and your standard error. If the extra work you are doing doesn't increase the precision of your p-value by at least an order of magnitude there are probably better ways you could be spending your time. Lower your error; don't sharpen it.

For example consider Nate Silver's approach. Just before an election that he has predicted the results for he writes about creative scenarios that would lead to his predictions being radically wrong. He's not computing probabilities for these events; just noting them as possibilities and considering their impacts. He's spending less time computing and more time thinking. Thinking about the impacts of events that are completely outside your model is something most of us spend far too little time doing.

Wednesday, February 09, 2011

Is bandwidth the new corn?

With the recent push back by the Canadian Conservative party against the CRTC's choice to allow usage based billing for internet I can't help but to wonder if we are seeing the beginning of another corn in the economy.

It is pretty well recognized that while corn does grow pretty well the reason why it is the go to ingredient in so many products that we buy is because of its regulated availability and subsidization. Because it is subsidized marginal profits don't fall off (marginal costs should go up eventually but subsidies compensate) so you don't get a point where you stop producing. So we can't ever get to equilibrium demand. Prices don't rise to the point where buyers start considering alternatives. Alternatives that would be more efficient if not for the subsidies. Subsidies we all eat.

Aren't we slowly moving in this direction for on demand video? There are two extremes for non-physical delivery for on demand video. One is the deliverer has a separate stream of the same video for each watcher delivered at the time of demand. If three neighbours all want to watch the same show at the same time it takes up three times the bandwidth than if one person wanted to watch it.

The other extreme is the video is delivered to everyone at the same time via broadcast and stored at the household to be played at time of demand. In this case storage is the alternative to bandwidth.

Which is more efficient? Well clearly that depends on the popularity of the video. If it is very popular the broadcast and store method (or rather the PVR method) is more efficient. If the consumer were to feel the cost of the resources he is consuming the market should move towards this efficiency. But without some form of usage billing the consumer sees no benefit to the PVR method over the Hulu/Netflix/web streaming method. PVRs are almost certainly more popular than streaming TV shows at the moment. But there is no extra cost to streaming House rather than recording and playing House.

So where does this get you? This gets you to the world of corn. A world where the costs are externalized. Or another parallel, our streets with respect to traffic and gasoline. On our streets our roads, our gas, and now even our cars are all subsidized with an aim towards universal access. Because of this the consumer (ie commuter) doesn't feel the real costs of his decisions. The cost of traffic is externalized but the benefits of convenience and independence are fully realized. So we get busy roads instead of an exodus to mass transit1.

Yes, ISPs can put down more last mile cable to increase bandwidth to give user's this free choice. And corn producers can always produce more corn. And we can always build more roads. Of course this never sounds like a stupid answer when we take the first step down the road of externalizing cost. I'm sorry. I meant to say universalizing access.

1Note that mass transit is also subsidized, and almost certainly more heavily subsidized than driving since they also receive the subsidization of roads and gas. But the point is that driving is subsidized but the rewards are not diminished.

Friday, December 03, 2010

How Groupon is nailing it (because you suck at evaluating prices)

It's easy to look at Groupon's success and shrug with "who knew people wanted coupons?" But that misses a big part of the magic. Groupon has figured out a way to present their coupons that manages to double their perceived value, and not in a way that misleads anyone.

To understand how they do this you first have to acknowledge that you suck evaluating prices. So do I. So do experts. You can read about all the reasons why you suck at prices in Dan Ariely's Predictably Irrational or William Poundsone's Priceless. They'll talk about a range of reasons like anchoring and default choice and comparable items. But what Groupon takes advantage of is that when it comes to price we're all about ratios.

The classic example of this is consider you are shopping for a pair of jeans. You find a pair you like at the beginning of the day for $80 but you decide to keep looking because you've just started shopping and you may find a better price. At the end of the day you're in another store, across town and they have the exact same pair of jeans for $130. You're now 30minutes away from the original store, and you're tired, and there's traffic. Would you go back for the better deal? Most people would say they would.

But lets say instead you are shopping for a laptop. Now the last store you are in is selling it for $1,550 and the original store is selling it for $1,500. Now do you go back? Most people would say they wouldn't. But the weird thing is the $50 savings has nothing to do with the product. On the one hand the deal is:

1. Get the thing you want
2. Pay out the lower price
3. Spend 30minutes in traffic

and in the other case the deal is:

1. Get the thing you want
2. Pay out the lower price
3. Pay an extra $50.

Really in both cases the question is would you spend 30minutes in traffic to save $50. (The response would be different still if it were framed as spend 30minutes in traffic to be paid $50. But that's neither here nor there.) But we instead fall into the trap of thinking about ratio savings.

So how does this apply to Groupon? Let's consider the deal I was offered today. Get a $125 coupon for Indochino.com for $50. That's great, right? $125 value for $50; that's 60% off. Pretty good. But you haven't actually got anything yet. So what can you get for $125 from Indochino? Well their specialty is custom suits that start at $299. So $125 is about 40% off. Still pretty good.

But really, your entire savings is never more than $75. Which means your savings off that suit is only around 25%. By splitting up the purchase of buying a suit from Indochino into buying a coupon from Groupon and then paying the rest at Indochino, Groupon has managed to leverage the savings rate from 25% up to 60%.

So you would figure if they are using leverage to make the first half of the transaction feel like a good deal then the second half of the transaction must feel like a terrible deal. Not quite. Remember that 40% number we arrived at? When we are actually buying the suit, which is a second transaction that doesn't occur on Groupon, that $50 we paid is out of mind. It's sunk cost. And we ignore sunk costs. (Actually sometimes we're really good at ignoring sunk costs, and sometimes we're terrible at it. For example, even if Indochino were to suddenly jack up their prices we would have a hard time not spending that coupon because it cost us $50 which we are resistant to ignoring.)

So the result is after the second transaction you've saved 40% thanks to Groupon. Feels pretty good. Feels like brand loyalty. Maybe you'll buy another coupon.

So Groupon has managed to transform 25% savings into a 40% savings and a 60% savings? Did I start out by saying they doubled the perceived value?

(40%+60%)/25% = 4X

Turns out they actually quadrupled the perceived value. Pretty clever.

Saturday, November 06, 2010

I want to make a map/reduce logistic regression machine in December. Who's on board?

Because of my work, logistic regression has become one of my favourite analytic tools. But now I've crossed the point where everything looks like a nail (which it does) and I'm at a point where I want to make my own hammer (for all these nails that are piling up). So why not write this to work in the space where the future of large dataset analysis is probably going to happen: the Hadoop map/reduce world.

I figure I will have new found time in December (I write the first CFA exam on December 4th) so I might as well try to do this. First step will be to make a simple weighted linear regression machine. If I can implement one in Excel surely it can't be so hard to implement one anywhere else. Then figuring out the actual algorithm will be a combination of digging into the R source code, using some common sense, and talking to a friend who has actually built one of these before.

But I'd love help if anyone's game. Even just to answer questions. Like, how do I actually set up a test Hadoop server? Or more importantly, is this a silly exercise?

Thursday, November 04, 2010

I use SAS and Maple the same way... and nothing else

Today I was thinking about how I use SAS: write some code that creates some stuff, select and run it, write some more code that looks at that stuff (usually in a separate tab), select and run it, then write some code that builds more stuff off of the old stuff, and repeat.

At times it's awkward and ungraceful. But it's also super handy that I don't have to know ahead of time that some code is going to take a few minutes to run so I had better write some code that stores the output somewhere. And it's also almost exactly how I used Maple. And how I understand one uses Mathematica (which I've used all of once, but I think I would love it if I used it more).

But I use no other language this way. And I've had opportunity. There were several years before I ever touched SAS where I had mostly stopped using Maple when I wrote a lot of JavaScript and PHP. And sometimes some Python and maybe some Ruby. And Java keeps popping up. Oh, and R. But this pattern of use has never been a consideration for me for anything other than SAS or Maple.

Is there a way to use any of these languages this way? Is there tool out there that I'm missing?

Friday, September 10, 2010

If credit is so tight then why do I keep getting offered more credit?

About a week ago I posted as a comment on the Atlantic's business blog. I thought it may be of interest to those who read here as well (the both of you).

Let me suggest what I think is happening in terms of the apparent contradiction in lenders being both too tight and at the same time offering too much money. The credit industry has gotten very good at classifying you into a risk category to determine chances that they will have to write off the money lent into that category.

As you may imagine, if the lender sees an expected profit on the amount of risk of lending to people in that risk category they will do it. The problem is it is not well solved how to model how much the risk of default changes as you extend someone's debt obligations (known as modeling your credit capacity). This is the sort of tool you need to determine someone's limit.

This hasn't been that much of an issue because a lender usually limits how much they will lend out based on their own available exposure, and they just spread it around to as many borrowers as possible.

But when modeled risk suddenly shrinks the number of profitable borrowers this constraint is no longer sufficient (even if the bank's acceptable exposure is also shrinking). And beyond that, their models are showing they can make up some of their lost profits by lending more into still profitable categories.

There actually do exist a few tools in modeling credit capacity but at the moment they are either crude (like using a flat income to total debt service ratio ceilings) or at the moment unproven. There certainly are people trying to solve this problem, and some institutions may even have working solutions. But it is a problem that not all institutions have solved yet.
I should add that in Canada there is the additional issue that credit card companies now need permission from the card holder to extend the card holder's credit limit. This means that risk to exposure formulas will need to be modified as new data comes in on how accepted limit increases are used. Until that is sorted out card companies will feel like they have all this extra safe exposure potential from unaccepted limit increases that they desperately want to use up.