In my current work I spend a lot of time working with models that try to predict the likelihood of a customer's future action. From time to time we have a need to model something new quickly and it is very tempting in those cases to take a different but similar model and see if it can simply be re-aligned to fit the new circumstance. Sometimes this works fine but it is interesting how often this fails dramatically.
But I have to be careful about writing about specific examples from work. So let's consider a hypothetical example instead to see the danger of assuming a model extends.
How about consider a software company who has started a pilot program of giving potential clients six week trials of their product. Now they would like to know as soon as possible what sort of customer is most likely to convert so they can focus their sales force most effectively. Rather than wait six weeks for results they decided to try to observe who will convert in the first week (ie a target of one week conversion). That way after one week, plus development time, they will have a model that they can start using. While this model won't have much accuracy in predicting the probability of converting in six, surely it will rank order correctly with a model that used a full observation period. After all, if a customer that doesn't convert in week one has similar characteristics as someone who does then it seems fair to conclude they just need a little bit more of a push to become more like their cohorts.
However, this model fails disastrously. Those who were supposed to be the most likely to convert ended up being the least likely. So the model is revisited to see what could have gone wrong. It is discovered that the biggest drivers of the model were
- lots of use of the software
- lots of interaction with the sales department (asking questions about price, etc)
- indicates on questionnaire that the software is to be used on an urgent project
- asks a lot of questions about the features that are locked by the trial.
And that's when the analyst hypothesizes his mistake. Instead of targeting for the desire to convert, he suspects that he's targeted for the need to make a decision quickly. So those who were rated highly by the model and didn't convert in the first week, didn't convert because they had decided in that week that they would never convert.
To test this theory the analyst checks whether those who did not convert, but were predicted to, had bought the competitor's software (the trial included some spyware so he can check for such things). And there is the answer. Not only had they bought the competitor's software in large numbers, but usually with in the first week of the trial. Those that were considered the best leads had, unbeknownst to the company, had already ruled themselves out of being future customers.
Now real life examples are rarely this clean cut. In reality, the population with a high likelihood of converting in the first week would probably be made of up a mixture of types. Instead of moving as a group to being the worst converters, you would probably some sort of indeterminate result.
So how could this mistake have been prevented. Well first by validating that there was some justification that the model could be extended. Perhaps he could have validated against the second week to find out sooner that he had made a mistake. But more importantly, the analyst waited too long to try to understand his variables. When making a model like this it is important to try to understand why each of the drivers of the model are actually predictive. And then test the relationships that you believe exist. That is to say construct a narrative that explains why your variable predicts the way it does, consider the other consequences of this narrative, and see if these other consequences exist in your data.
Even though we aren't wearing lab coats it is a good idea to keep in mind that we are still doing science. The scientific method makes a pretty good guide.