*March, 2014*

When I was at the University, I had a very amusing and offbeat researcher that worked as a tutor for the “*Probability and Statistics*” class. One day, during his lesson, he taught us a life lesson about his study subject that I could never forget.

The lesson started stating that many studies proved the acceptable idea that there is a precise correlation between *smoking cigarettes* and *lung cancer*.

Starting from a significant dataset, this researcher illustrated one of this studies, proving that the *percentage of people with lung cancer who had a past as a smoker* was **high** (I cannot remember the exact value, but something meaningful – like more than 75%). It seemed clearly a proof of direct causation between the two facts.

Then he explained that the common term ‘Bayesian’ comes from an eminent statistician named **R.A. Fisher**, and originally had a denigrating connotation.

R.A.Fisher besides not to believe in Bayesian probability, also didn’t believe that smoking can cause lung cancer, and in a magistral confutation showed that, starting from the same dataset, *the percentage of smokers that actually died of lung cancer* was incredibly **low**.

Then, this researcher asked:

“Do you know why R.A.Fischer intervened in this subject ?

He was a smoker. And he was a consultant paid by the tobacco companies”.

The life lesson was:

“Every time you will see a probabilistic value, ask yourself who stands behind it and then ask yourself why”.

Now back to our days.

One of the best metrics to evaluate your forecast probably is **calibration**. Calibration is intended as your probability of an event to occur (a *probablistic* factor), against the actual observed frequency (a *statistical* factor).

Let’s make an example: you predict a probability of 60% of raining in a certain time interval; if the actual observed frequency is 55%, it means your predictions are accurate. If it turns out to be 20%, maybe it’s time to review your model (or your work, or the damn wheather in general).

J.D. Eggleston lived with his son in Kansas City. Kansas City has got a peculiar wheather – really variegated and extreme, with hellish summers, droughts and tornadoes (as every ‘The Wizard of Oz’ reader knows).

J.D. Eggleston one day asked himself how good were local TV wheather forecasts. They turned out to be really scanty (see the figure):

The question is: Why do they prefer to present such a miserable result (when they declared a 100% of forecast probability they ended up having only a 67% of accuracy) instead of using – for instance – the National Weather Service, which turned out to be really much more reliable and – above all – free ?

The answer resides in this payoff matrix, which reflects the way weather forecast channels usually sense their job:

In a payoff matrix, every choice in a decision-making process has a *cost*.

If your prediction about tomorrow turns out to be accurate (like the first and third line in this table), this is good. Unfortunately this is also what you are supposed to be paid for, so it is positive but not impressive: let’s give it a score of +100.

Otherwise, if your prediction about tomorrow turns out to be fallacious, we have to face two very different scenarios.

In one hand we have a false positive: it was supposed to be rainy but it is not rainy. Despite of being a mistake, this situation can be perceived by the users as positive: *“I planned to spend my Sunday at home watching football, but it turns out to be a sunny day, so I am going out with a smile upon my face”* (if this sounds incredible for you, please see Wikipedia Serendipity article to know more about this subject, and please, take your life easy, man).

So this is positive anyway: let’s give it a score of +50.

In the other hand, we have a false negative: it was supposed not to rain but it is actually raining. *“Damn wheather forecasts! You ruined my weekend and I hate you! I will never trust you anymore!”* This scenario is way the worst possible, and we gave it the score of -1000.

In such a payoff matrix your major effort is just *to avoid to end up in this last scenario*: any other scenario is positive, even the one where you are wrong with a false positive.

Now, do you understand why the local meteorologists turned out to suck so much in their predictions ?

They did it purposely, it was a matter of *calibration*: even if they have a significant probability not to be rainy, they chosed to be cautious and affirm that it could have been rainy anyway. The threshold of nice weather was intentionally pushed to high values of certitude: only if they were super sure to be a nice sunny day they declared a forecast of a nice sunny day.

So, as my University researcher taught, in Probability and Statistics some results are not just a potshot, and “every time you will see a probabilistic value, ask yourself who stands behind it and then ask yourself why”.