Sunday, October 25, 2009

Bayes' theorem BUNKERIZED - You are busted, Mr. Bayes^^

This post is a continuation of the articles about the Bayes' theorem
previously posted by Paskal on this Blog and the blog of Ascot Project.

The Bayes' theorem is indeed a typical example of how we, at BunkerSofa,
approach dogma and technological paradigms currently used.
In particular, we like to spot logical fallacies as well as technical misinterpretations and argue why they are wrong and can lead to systems not working at all.
The act of spotting, closing up and
criticizing using Bunkersofist values ("putting into quarantine" if I dare say) is what I want to call to BUNKERIZE.

Also, very important to be aware of.
Bayes' theorem is heavily used in the field of Speech Recognition, Information Retrieval, Natural Language Processing and Anti-Spam methods; So I think it is all the more crucial to question its validity.

So let's get to the point; what on earth is wrong with the Bayes' theorem , a solid and very respected theorem that has hardly ever been questioned?
Well, apart from that it's based on a triviality, nothing on the mathematical standpoint; But when it comes to its usage for example in the Computer Science field, problems arise;

1) Triviality of the Bayes' theorem:

First let met remind you what the theorem is about.
The famous theorem only states that:
P(A/B)=P(B/A)*P(B)/P(A)

where A and B are 2 given events and P a probability.
So, in fact, the theorem can almost be reduced to stating that
A & B = B & A

Why?
Well, let me demonstrate the Bayes theorem again then in 3 lines.

a) As Bayes defined, I define:
P(A/B)=P(A&B)/P(B)
P(B/A)=P(B&A)/P(A)

b) But A&B = B&A so P(A&B)=P(B&A)
c) Then P(A/B)=P(B/A)*P(B)/P(A)

So all it does is relying on the fact that A&B = B&A is true.
This is indeed mathematically true because that's the very definition of "&", a symmetric relation.

2) Misinterpretations of the Bayes' theorem:

Well, how you interpret (
A&B = B&A) is the source of all problems.
(A&B = B&A) is a binary formula that does not take Time into account at all, namely it is non-temporal, that is, Time just does not exist inside it.

The question then is: how do you interpret what it means for the real world, in practice where Time exists and flows?
To understand the intricacies, let's focus in particular on (A&B) and speculate on its meaning.

a)Is it A immediately followed by B ?
If so, followed within how much time?

b)Does A have to simultaneously occur at the same exact instant as B?

Well, the answer is neither a) nor b) for the very reason that the theorem does not consider the flow of Time.

- So if you want to apply the theorem for the real world, you have to consider cases where a) and b) are confounded so that you do not have to be confronted with the interpretation problem;
And then it's not big deal. Bayes works perfectly but the cases are restricted.

Consider the example: A="To be a girl" and B="To wear a dress".
Another easy one is: A="The glass is filled" and B="To be poured with Red Wine".

As you may have noticed, these 2 examples
are about events that are sufficiently macroscopic enough for the human brain to consider them non-temporal.
And this is the condition for the Theorem to be applied correctly in the real world.

The condition is the following:
You have to consider only situations where the events are such that it does not matter whether A is prior or subsequent to B.

- But apparently, many computer scientists haven't not understood this crucial condition well.
They are currently applying it quite vaguely for Speech Recognition where they claim without much care such, statements like (A&B)=(B&A) where for example, A=("utterance of the phoneme "bu") and B=("utterance of the phoneme "zz").

Obviously, ("bu"&"zz") is not the same as
("zz"&"bu")...

That's pretty much it for now!
Bayes' theorem has just been
BUNKERIZED
You are busted, Mr.Bayes!^^

I hope you will take this quarantine of this virus of the mind (i.e. Bayes' theorem) to study it and know how to use it more carefully in your applications.

Also, this may be a new movement at BunkerSofa and a fun application of the Bunkersofist philosophy, namely we could
Bunkerize viruses of the mind that are dangerous, stick to the human brain and are based on fallacies.

Should we call ourselves the
MemeBusters?^^

1. Amazing graphical illustration, julien.
Amazing Bunkersofigram.

I do think that that post is the best illustration so far of what bunkersofism consists in.

Questioning things that are taken for granted and reinventing things.

It is like Einstein that doubted the mechanical view that Newton had regarding the universe.
If concepts, or theories remain too long in people's brains, they become viruses for the mind and therefore they ve got to be quarantined, that is "bunkerized"!

2. もう直ぐ日本語訳も出る、もう少々お待ち下さい。

3. Bayesian statistics has always been controversial. Your critique is vacuous. If you actually did your research you would find that REAL criticisms of Bayesian statistics are subtle and complex, and no single blog post is going to "debunk" a well established field. Philosophical considerations aside, Bayesian statistics has for decades proven itself an invaluable tool across disciplines. Why not, instead of blogging about science like you know something about it, go to school and do REAL SCIENCE and see if you can make an actual difference.

4. Mr Anonymous, show me where the reasoning is false and prove me wrong. txs.

5. A trusted theory does not make it valid and right.
Please remember the Ptolemaic system which had been used for more than one thousand years.
http://en.wikipedia.org/wiki/Geocentric_model#Ptolemaic_system

Tks.

6. >1) Triviality of the Bayes' theorem

All you have done here is shown the derivation of Bayes' theorem in reverse. As seen here http://en.wikipedia.org/wiki/Bayes'_theorem#Derivation_from_conditional_probabilities

>So all it does is relying on the fact that A&B = B&A is true.

In general, you cannot show the "triviality" of any mathematical statement by reducing it to another. In fact, all theorems are based on simpler, previously known facts. Instead a theorem must be judged first on correctness, and second on how useful it is in proving other theorems (or how useful it is in applications). Bayes' Theorem is correct, as you have shown, and it has been used extensively in many fields.

>(A&B = B&A) is a binary formula that does not take Time into account at all [...]

The only issue here is how one defines their random variables A and B. In domains where time is an issue it is accounted for. Commonly you will see random variables subscripted by 't' if time is at play.

>Bayes works perfectly but the cases are restricted.

There are no restrictions on Bayes' theorem, as you proved yourself. Symmetry (p(A,B)=p(B,A)) is true for all A and B which led directly to the derivation.

>They are currently applying it quite vaguely for Speech Recognition [...]

Look at any speech recognition paper, and you will see time accounted for. One of the tools most commonly used in speech recognition is the Dynamic Bayesian Network, which is a model specifically designed for temporal domains. Looking in my graphical models textbook (Probabilistic Graphical Models; Daphne Koller & Nir Friedman) there is an entire section on temporal models.

>virus of the mind (i.e. Bayes' theorem)

For perspective, Bayes' theorem has been around since 1763. Yes the Ptolemaic system was around for a long time, but you can blame that on the scarcity of quality astronomical observations. On the other hand, mathematics is a field not particularly susceptible to memes. If a long-lived theorem could be disproven or discounted in an afternoon by a blogger, a mathematician would have published such a result long ago because it would have made her famous. In math there aren't many forces that work to perpetuate falsities, because a mathematician could make a career out of finding and eliminating those falsities.

I do think there is a place for skepticism of science, and science journalism. There is such a thing as "bad" science, an obvious example being funding bias. Just be aware that a lot of facts in science have rightfully withstood the test of time. And in any case, do your research. It wouldn't have taken much effort for you to find that probabilistic modelling folks have an entire subfield devoted to temporal models.

If you are interested in legitimate critiques to the use of Bayes' theorem in practice, you could take a look at this: http://www.stat.columbia.edu/~gelman/research/published/badbayesmain.pdf To summarize, Bayes' thm. is often applied like this [see http://en.wikipedia.org/wiki/Bayesian_inference#Posterior_distribution_of_the_binomial_parameter for another example]:

D - data
H - hypothesis

P(H|D) = P(D|H)*P(H) / sum_H { P(D|H)*P(H) }

That is, you wish to evaluate a hypothetical model H using some training data D. Bayesian learning allows you to compute P(H|D) from P(D|H) which can be easy by design. The nasty issue is P(H). This is called the prior distribution over hypotheses. How could you possibly know the probability of a hypothesis when disregarding any evidence? Usually P(H) is chosen to be a conjugate prior which is purely out of computational convenience. The consequences of this have been studied, yet it remains controversial. But in countless applications it (provably) works just fine. It's better than nothing! And there are bright young folks today doing their PhD work in Learning Theory, and hopefully they can come up with something better.

7. 1) Thanks for the detailed comment;
Looking backwards at this post,
indeed this passage
"- But apparently, many computer scientists haven't not understood this crucial condition well.
They are currently applying it quite vaguely for Speech Recognition where they claim without much care such, statements like (A&B)=(B&A) where for example, A=("utterance of the phoneme "bu") and B=("utterance of the phoneme "zz").
Obviously, ("bu"&"zz") is not the same as ("zz"&"bu")... "
is inappropriate since it is not backed by references;

http://en.wikipedia.org/wiki/Hidden_Markov_model is the simplest type of DBN and it seems indeed to cope well with the "time" issue by considering probabilities of transition.

2) "On the other hand, mathematics is a field not particularly susceptible to memes. If a long-lived theorem could be disproven or discounted in an afternoon by a blogger, a mathematician would have published such a result long ago because it would have made her famous."
is however not an argument to prove that some theorems can't be false.